You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Nicholas Chammas (JIRA)" <ji...@apache.org> on 2015/06/03 20:36:38 UTC
[jira] [Commented] (SPARK-4983) Add sleep() before tagging EC2 instances to allow instance metadata to propagate

    [ https://issues.apache.org/jira/browse/SPARK-4983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14571467#comment-14571467 ] 

Nicholas Chammas commented on SPARK-4983:
-----------------------------------------

Per the discussion on [SPARK-7900], I think we should increase the wait time from the current 5 seconds to, say, 15 or 30 seconds.

An alternative proposed on [SPARK-7900] is to make fewer tagging calls, since the extra calls seem to make it more likely the we get metadata errors from AWS (like, "instance ID not found" right after AWS itself has given us the instance ID).

> Add sleep() before tagging EC2 instances to allow instance metadata to propagate
> --------------------------------------------------------------------------------
>
>                 Key: SPARK-4983
>                 URL: https://issues.apache.org/jira/browse/SPARK-4983
>             Project: Spark
>          Issue Type: Bug
>          Components: EC2
>    Affects Versions: 1.2.0
>            Reporter: Nicholas Chammas
>            Assignee: Gen TANG
>            Priority: Minor
>              Labels: starter
>             Fix For: 1.2.2, 1.3.0
>
>
> We launch EC2 instances in {{spark-ec2}} and then immediately tag them in a separate boto call. Sometimes, EC2 doesn't get enough time to propagate information about the just-launched instances, so when we go to tag them we get a server that doesn't know about them yet.
> This yields the following type of error:
> {code}
> Launching instances...
> Launched 1 slaves in us-east-1b, regid = r-cf780321
> Launched master in us-east-1b, regid = r-da7e0534
> Traceback (most recent call last):
>   File "./ec2/spark_ec2.py", line 1284, in <module>
>     main()
>   File "./ec2/spark_ec2.py", line 1276, in main
>     real_main()
>   File "./ec2/spark_ec2.py", line 1122, in real_main
>     (master_nodes, slave_nodes) = launch_cluster(conn, opts, cluster_name)
>   File "./ec2/spark_ec2.py", line 646, in launch_cluster
>     value='{cn}-master-{iid}'.format(cn=cluster_name, iid=master.id))
>   File ".../spark/ec2/lib/boto-2.34.0/boto/ec2/ec2object.py", line 80, in add_tag
>     self.add_tags({key: value}, dry_run)
>   File ".../spark/ec2/lib/boto-2.34.0/boto/ec2/ec2object.py", line 97, in add_tags
>     dry_run=dry_run
>   File ".../spark/ec2/lib/boto-2.34.0/boto/ec2/connection.py", line 4202, in create_tags
>     return self.get_status('CreateTags', params, verb='POST')
>   File ".../spark/ec2/lib/boto-2.34.0/boto/connection.py", line 1223, in get_status
>     raise self.ResponseError(response.status, response.reason, body)
> boto.exception.EC2ResponseError: EC2ResponseError: 400 Bad Request
> <?xml version="1.0" encoding="UTF-8"?>
> <Response><Errors><Error><Code>InvalidInstanceID.NotFound</Code><Message>The instance ID 'i-585219a6' does not exist</Message></Error></Errors><RequestID>b9f1ad6e-59b9-47fd-a693-527be1f779eb</RequestID></Response>
> {code}
> The solution is to tag the instances in the same call that launches them, or less desirably, tag the instances after some short wait.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org