You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Matt Work Coarr <ma...@gmail.com> on 2014/06/05 22:44:23 UTC

creating new ami image for spark ec2 commands

How would I go about creating a new AMI image that I can use with the spark
ec2 commands? I can't seem to find any documentation.  I'm looking for a
list of steps that I'd need to perform to make an Amazon Linux image ready
to be used by the spark ec2 tools.

I've been reading through the spark 1.0.0 documentation, looking at the
script itself (spark_ec2.py), and looking at the github project
mesos/spark-ec2.

>From what I can tell, the spark_ec2.py script looks up the id of the AMI
based on the region and machine type (hvm or pvm) using static content
derived from the github repo mesos/spark-ec2.

The spark ec2 script loads the AMI id from this base url:
https://raw.github.com/mesos/spark-ec2/v2/ami-list
(Which presumably comes from https://github.com/mesos/spark-ec2 )

For instance, I'm working with us-east-1 and pvm, I'd end up with AMI id:
ami-5bb18832

Is there a list of instructions for how this AMI was created?  Assuming I'm
starting with my own Amazon Linux image, what would I need to do to make it
usable where I could pass that AMI id to spark_ec2.py rather than using the
default spark-provided AMI?

Thanks,
Matt

Re: creating new ami image for spark ec2 commands

Posted by Gianluca Privitera <gi...@studio.unibo.it>.

Hi,
I think the best thing you could do is run an empty AMI with that ID, 
add the stuff you want to add, then copy it through the AWS console, 
then launch the ec2 script using the new AMI you just created.

On 06/06/2014 09:20, Akhil Das wrote:
> Hi Matt,
>
> You will be needing the following on the AMI:
>
> 1. Java Installed
> 2. Root login enabled
> 3. /mnt should be available (Since all the storage goes here)
>
> Rest of the things spark-ec2 script will set up for you. Let me know 
> if you need anymore clarification on this.
>
>
>
> Thanks
> Best Regards
>
>
> On Fri, Jun 6, 2014 at 6:31 PM, Matt Work Coarr 
> <mattcoarr.work@gmail.com <ma...@gmail.com>> wrote:
>
>     Thanks for the response Akhil.  My email may not have been clear,
>     but my question is about what should be inside the AMI image, not
>     how to pass an AMI id in to the spark_ec2 script.
>
>     Should certain packages be installed? Do certain directories need
>     to exist? etc...
>
>
>     On Fri, Jun 6, 2014 at 4:40 AM, Akhil Das
>     <akhil@sigmoidanalytics.com <ma...@sigmoidanalytics.com>>
>     wrote:
>
>         you can comment out this function and Create a new one which
>         will return your ami-id and the rest of the script will run fine.
>
>         def get_spark_ami(opts):
>           instance_types = {
>             "m1.small":    "pvm",
>             "m1.medium":   "pvm",
>             "m1.large":    "pvm",
>             "m1.xlarge":   "pvm",
>             "t1.micro":    "pvm",
>             "c1.medium":   "pvm",
>             "c1.xlarge":   "pvm",
>             "m2.xlarge":   "pvm",
>             "m2.2xlarge":  "pvm",
>             "m2.4xlarge":  "pvm",
>             "cc1.4xlarge": "hvm",
>             "cc2.8xlarge": "hvm",
>             "cg1.4xlarge": "hvm",
>             "hs1.8xlarge": "hvm",
>             "hi1.4xlarge": "hvm",
>             "m3.xlarge":   "hvm",
>             "m3.2xlarge":  "hvm",
>             "cr1.8xlarge": "hvm",
>             "i2.xlarge":   "hvm",
>             "i2.2xlarge":  "hvm",
>             "i2.4xlarge":  "hvm",
>             "i2.8xlarge":  "hvm",
>             "c3.large":    "pvm",
>             "c3.xlarge":   "pvm",
>             "c3.2xlarge":  "pvm",
>             "c3.4xlarge":  "pvm",
>             "c3.8xlarge":  "pvm"
>           }
>           if opts.instance_type in instance_types:
>             instance_type = instance_types[opts.instance_type]
>           else:
>             instance_type = "pvm"
>             print >> stderr,\
>                 "Don't recognize %s, assuming type is pvm" %
>         opts.instance_type
>
>           ami_path = "%s/%s/%s" % (AMI_PREFIX, opts.region, instance_type)
>           try:
>             ami = urllib2.urlopen(ami_path).read().strip()
>             print "Spark AMI: " + ami
>           except:
>             print >> stderr, "Could not resolve AMI at: " + ami_path
>             sys.exit(1)
>
>           return ami
>
>         Thanks
>         Best Regards
>
>
>         On Fri, Jun 6, 2014 at 2:14 AM, Matt Work Coarr
>         <mattcoarr.work@gmail.com <ma...@gmail.com>>
>         wrote:
>
>             How would I go about creating a new AMI image that I can
>             use with the spark ec2 commands? I can't seem to find any
>             documentation.  I'm looking for a list of steps that I'd
>             need to perform to make an Amazon Linux image ready to be
>             used by the spark ec2 tools.
>
>             I've been reading through the spark 1.0.0 documentation,
>             looking at the script itself (spark_ec2.py), and looking
>             at the github project mesos/spark-ec2.
>
>             From what I can tell, the spark_ec2.py script looks up the
>             id of the AMI based on the region and machine type (hvm or
>             pvm) using static content derived from the github repo
>             mesos/spark-ec2.
>
>             The spark ec2 script loads the AMI id from this base url:
>             https://raw.github.com/mesos/spark-ec2/v2/ami-list
>             (Which presumably comes from
>             https://github.com/mesos/spark-ec2 )
>
>             For instance, I'm working with us-east-1 and pvm, I'd end
>             up with AMI id:
>             ami-5bb18832
>
>             Is there a list of instructions for how this AMI was
>             created?  Assuming I'm starting with my own Amazon Linux
>             image, what would I need to do to make it usable where I
>             could pass that AMI id to spark_ec2.py rather than using
>             the default spark-provided AMI?
>
>             Thanks,
>             Matt
>
>
>
>

Re: creating new ami image for spark ec2 commands

Posted by Matt Work Coarr <ma...@gmail.com>.

Thanks Akhil! I'll give that a try!

Re: creating new ami image for spark ec2 commands

Posted by Akhil Das <ak...@sigmoidanalytics.com>.

Hi Matt,

You will be needing the following on the AMI:

1. Java Installed
2. Root login enabled
3. /mnt should be available (Since all the storage goes here)

Rest of the things spark-ec2 script will set up for you. Let me know if you
need anymore clarification on this.



Thanks
Best Regards


On Fri, Jun 6, 2014 at 6:31 PM, Matt Work Coarr <ma...@gmail.com>
wrote:

> Thanks for the response Akhil.  My email may not have been clear, but my
> question is about what should be inside the AMI image, not how to pass an
> AMI id in to the spark_ec2 script.
>
> Should certain packages be installed? Do certain directories need to
> exist? etc...
>
>
> On Fri, Jun 6, 2014 at 4:40 AM, Akhil Das <ak...@sigmoidanalytics.com>
> wrote:
>
>> you can comment out this function and Create a new one which will return
>> your ami-id and the rest of the script will run fine.
>>
>> def get_spark_ami(opts):
>>   instance_types = {
>>     "m1.small":    "pvm",
>>     "m1.medium":   "pvm",
>>     "m1.large":    "pvm",
>>     "m1.xlarge":   "pvm",
>>     "t1.micro":    "pvm",
>>     "c1.medium":   "pvm",
>>     "c1.xlarge":   "pvm",
>>     "m2.xlarge":   "pvm",
>>     "m2.2xlarge":  "pvm",
>>     "m2.4xlarge":  "pvm",
>>     "cc1.4xlarge": "hvm",
>>     "cc2.8xlarge": "hvm",
>>     "cg1.4xlarge": "hvm",
>>     "hs1.8xlarge": "hvm",
>>     "hi1.4xlarge": "hvm",
>>     "m3.xlarge":   "hvm",
>>     "m3.2xlarge":  "hvm",
>>     "cr1.8xlarge": "hvm",
>>     "i2.xlarge":   "hvm",
>>     "i2.2xlarge":  "hvm",
>>     "i2.4xlarge":  "hvm",
>>     "i2.8xlarge":  "hvm",
>>     "c3.large":    "pvm",
>>     "c3.xlarge":   "pvm",
>>     "c3.2xlarge":  "pvm",
>>     "c3.4xlarge":  "pvm",
>>     "c3.8xlarge":  "pvm"
>>   }
>>   if opts.instance_type in instance_types:
>>     instance_type = instance_types[opts.instance_type]
>>   else:
>>     instance_type = "pvm"
>>     print >> stderr,\
>>         "Don't recognize %s, assuming type is pvm" % opts.instance_type
>>
>>   ami_path = "%s/%s/%s" % (AMI_PREFIX, opts.region, instance_type)
>>   try:
>>     ami = urllib2.urlopen(ami_path).read().strip()
>>     print "Spark AMI: " + ami
>>   except:
>>     print >> stderr, "Could not resolve AMI at: " + ami_path
>>     sys.exit(1)
>>
>>   return ami
>>
>> Thanks
>> Best Regards
>>
>>
>> On Fri, Jun 6, 2014 at 2:14 AM, Matt Work Coarr <mattcoarr.work@gmail.com
>> > wrote:
>>
>>> How would I go about creating a new AMI image that I can use with the
>>> spark ec2 commands? I can't seem to find any documentation.  I'm looking
>>> for a list of steps that I'd need to perform to make an Amazon Linux image
>>> ready to be used by the spark ec2 tools.
>>>
>>> I've been reading through the spark 1.0.0 documentation, looking at the
>>> script itself (spark_ec2.py), and looking at the github project
>>> mesos/spark-ec2.
>>>
>>> From what I can tell, the spark_ec2.py script looks up the id of the AMI
>>> based on the region and machine type (hvm or pvm) using static content
>>> derived from the github repo mesos/spark-ec2.
>>>
>>> The spark ec2 script loads the AMI id from this base url:
>>> https://raw.github.com/mesos/spark-ec2/v2/ami-list
>>> (Which presumably comes from https://github.com/mesos/spark-ec2 )
>>>
>>> For instance, I'm working with us-east-1 and pvm, I'd end up with AMI id:
>>> ami-5bb18832
>>>
>>> Is there a list of instructions for how this AMI was created?  Assuming
>>> I'm starting with my own Amazon Linux image, what would I need to do to
>>> make it usable where I could pass that AMI id to spark_ec2.py rather than
>>> using the default spark-provided AMI?
>>>
>>> Thanks,
>>> Matt
>>>
>>
>>
>

Re: creating new ami image for spark ec2 commands

Posted by Matt Work Coarr <ma...@gmail.com>.

Thanks for the response Akhil.  My email may not have been clear, but my
question is about what should be inside the AMI image, not how to pass an
AMI id in to the spark_ec2 script.

Should certain packages be installed? Do certain directories need to exist?
etc...


On Fri, Jun 6, 2014 at 4:40 AM, Akhil Das <ak...@sigmoidanalytics.com>
wrote:

> you can comment out this function and Create a new one which will return
> your ami-id and the rest of the script will run fine.
>
> def get_spark_ami(opts):
>   instance_types = {
>     "m1.small":    "pvm",
>     "m1.medium":   "pvm",
>     "m1.large":    "pvm",
>     "m1.xlarge":   "pvm",
>     "t1.micro":    "pvm",
>     "c1.medium":   "pvm",
>     "c1.xlarge":   "pvm",
>     "m2.xlarge":   "pvm",
>     "m2.2xlarge":  "pvm",
>     "m2.4xlarge":  "pvm",
>     "cc1.4xlarge": "hvm",
>     "cc2.8xlarge": "hvm",
>     "cg1.4xlarge": "hvm",
>     "hs1.8xlarge": "hvm",
>     "hi1.4xlarge": "hvm",
>     "m3.xlarge":   "hvm",
>     "m3.2xlarge":  "hvm",
>     "cr1.8xlarge": "hvm",
>     "i2.xlarge":   "hvm",
>     "i2.2xlarge":  "hvm",
>     "i2.4xlarge":  "hvm",
>     "i2.8xlarge":  "hvm",
>     "c3.large":    "pvm",
>     "c3.xlarge":   "pvm",
>     "c3.2xlarge":  "pvm",
>     "c3.4xlarge":  "pvm",
>     "c3.8xlarge":  "pvm"
>   }
>   if opts.instance_type in instance_types:
>     instance_type = instance_types[opts.instance_type]
>   else:
>     instance_type = "pvm"
>     print >> stderr,\
>         "Don't recognize %s, assuming type is pvm" % opts.instance_type
>
>   ami_path = "%s/%s/%s" % (AMI_PREFIX, opts.region, instance_type)
>   try:
>     ami = urllib2.urlopen(ami_path).read().strip()
>     print "Spark AMI: " + ami
>   except:
>     print >> stderr, "Could not resolve AMI at: " + ami_path
>     sys.exit(1)
>
>   return ami
>
> Thanks
> Best Regards
>
>
> On Fri, Jun 6, 2014 at 2:14 AM, Matt Work Coarr <ma...@gmail.com>
> wrote:
>
>> How would I go about creating a new AMI image that I can use with the
>> spark ec2 commands? I can't seem to find any documentation.  I'm looking
>> for a list of steps that I'd need to perform to make an Amazon Linux image
>> ready to be used by the spark ec2 tools.
>>
>> I've been reading through the spark 1.0.0 documentation, looking at the
>> script itself (spark_ec2.py), and looking at the github project
>> mesos/spark-ec2.
>>
>> From what I can tell, the spark_ec2.py script looks up the id of the AMI
>> based on the region and machine type (hvm or pvm) using static content
>> derived from the github repo mesos/spark-ec2.
>>
>> The spark ec2 script loads the AMI id from this base url:
>> https://raw.github.com/mesos/spark-ec2/v2/ami-list
>> (Which presumably comes from https://github.com/mesos/spark-ec2 )
>>
>> For instance, I'm working with us-east-1 and pvm, I'd end up with AMI id:
>> ami-5bb18832
>>
>> Is there a list of instructions for how this AMI was created?  Assuming
>> I'm starting with my own Amazon Linux image, what would I need to do to
>> make it usable where I could pass that AMI id to spark_ec2.py rather than
>> using the default spark-provided AMI?
>>
>> Thanks,
>> Matt
>>
>
>

Re: creating new ami image for spark ec2 commands

Posted by Akhil Das <ak...@sigmoidanalytics.com>.

you can comment out this function and Create a new one which will return
your ami-id and the rest of the script will run fine.

def get_spark_ami(opts):
  instance_types = {
    "m1.small":    "pvm",
    "m1.medium":   "pvm",
    "m1.large":    "pvm",
    "m1.xlarge":   "pvm",
    "t1.micro":    "pvm",
    "c1.medium":   "pvm",
    "c1.xlarge":   "pvm",
    "m2.xlarge":   "pvm",
    "m2.2xlarge":  "pvm",
    "m2.4xlarge":  "pvm",
    "cc1.4xlarge": "hvm",
    "cc2.8xlarge": "hvm",
    "cg1.4xlarge": "hvm",
    "hs1.8xlarge": "hvm",
    "hi1.4xlarge": "hvm",
    "m3.xlarge":   "hvm",
    "m3.2xlarge":  "hvm",
    "cr1.8xlarge": "hvm",
    "i2.xlarge":   "hvm",
    "i2.2xlarge":  "hvm",
    "i2.4xlarge":  "hvm",
    "i2.8xlarge":  "hvm",
    "c3.large":    "pvm",
    "c3.xlarge":   "pvm",
    "c3.2xlarge":  "pvm",
    "c3.4xlarge":  "pvm",
    "c3.8xlarge":  "pvm"
  }
  if opts.instance_type in instance_types:
    instance_type = instance_types[opts.instance_type]
  else:
    instance_type = "pvm"
    print >> stderr,\
        "Don't recognize %s, assuming type is pvm" % opts.instance_type

  ami_path = "%s/%s/%s" % (AMI_PREFIX, opts.region, instance_type)
  try:
    ami = urllib2.urlopen(ami_path).read().strip()
    print "Spark AMI: " + ami
  except:
    print >> stderr, "Could not resolve AMI at: " + ami_path
    sys.exit(1)

  return ami

Thanks
Best Regards


On Fri, Jun 6, 2014 at 2:14 AM, Matt Work Coarr <ma...@gmail.com>
wrote:

> How would I go about creating a new AMI image that I can use with the
> spark ec2 commands? I can't seem to find any documentation.  I'm looking
> for a list of steps that I'd need to perform to make an Amazon Linux image
> ready to be used by the spark ec2 tools.
>
> I've been reading through the spark 1.0.0 documentation, looking at the
> script itself (spark_ec2.py), and looking at the github project
> mesos/spark-ec2.
>
> From what I can tell, the spark_ec2.py script looks up the id of the AMI
> based on the region and machine type (hvm or pvm) using static content
> derived from the github repo mesos/spark-ec2.
>
> The spark ec2 script loads the AMI id from this base url:
> https://raw.github.com/mesos/spark-ec2/v2/ami-list
> (Which presumably comes from https://github.com/mesos/spark-ec2 )
>
> For instance, I'm working with us-east-1 and pvm, I'd end up with AMI id:
> ami-5bb18832
>
> Is there a list of instructions for how this AMI was created?  Assuming
> I'm starting with my own Amazon Linux image, what would I need to do to
> make it usable where I could pass that AMI id to spark_ec2.py rather than
> using the default spark-provided AMI?
>
> Thanks,
> Matt
>

Re: creating new ami image for spark ec2 commands

Posted by Nicholas Chammas <ni...@gmail.com>.

Yeah, we badly need new AMIs that include at a minimum package/security
updates and Python 2.7. There is an open issue to track the 2.7 AMI update
<https://issues.apache.org/jira/browse/SPARK-922>, at least.


On Thu, Jun 12, 2014 at 3:34 PM, <un...@gmail.com> wrote:

> Creating AMIs from scratch is a complete pain in the ass. If you have a
> spare week, sure. I understand why the team avoids it.
>
> The easiest way is probably to spin up a working instance and then use
> Amazons "save as new AMI", but that has some major limitations, especially
> with software not expecting it. ("There are two of me now!") Worker nodes
> might cope better than the master.
>
> But yes, I also would love new AMIs that don't pull down 200 meg every
> time I spin up.
> ("Spin up a cluster in five minutes" HA!) Also, AMIs per region are also
> good for costs. I've thought of doing up new ones, (since I have
> experience) but I have no time and other issues first. Perhaps once I know
> Spark better.
>
> At least with spark, we have more control over the scripts exactly because
> they are "Primitive". I had a quick look at YARN/Ambari, and it wasn't
> obvious they were any better with EC2, and a hundred times the complexity.
>
> I expect most AWS-heavy companies have a full time person just managing
> AMIs. They are that annoying. It's what makes Cloudera attractive.
>
> Jeremy Lee   BCompSci (Hons)
> The Unorthodox Engineers
>
> On 6 Jun 2014, at 6:44 am, Matt Work Coarr <ma...@gmail.com>
> wrote:
>
> How would I go about creating a new AMI image that I can use with the
> spark ec2 commands? I can't seem to find any documentation.  I'm looking
> for a list of steps that I'd need to perform to make an Amazon Linux image
> ready to be used by the spark ec2 tools.
>
> I've been reading through the spark 1.0.0 documentation, looking at the
> script itself (spark_ec2.py), and looking at the github project
> mesos/spark-ec2.
>
> From what I can tell, the spark_ec2.py script looks up the id of the AMI
> based on the region and machine type (hvm or pvm) using static content
> derived from the github repo mesos/spark-ec2.
>
> The spark ec2 script loads the AMI id from this base url:
> https://raw.github.com/mesos/spark-ec2/v2/ami-list
> (Which presumably comes from https://github.com/mesos/spark-ec2 )
>
> For instance, I'm working with us-east-1 and pvm, I'd end up with AMI id:
> ami-5bb18832
>
> Is there a list of instructions for how this AMI was created?  Assuming
> I'm starting with my own Amazon Linux image, what would I need to do to
> make it usable where I could pass that AMI id to spark_ec2.py rather than
> using the default spark-provided AMI?
>
> Thanks,
> Matt
>
>

Re: creating new ami image for spark ec2 commands

Posted by un...@gmail.com.

Creating AMIs from scratch is a complete pain in the ass. If you have a spare week, sure. I understand why the team avoids it.

The easiest way is probably to spin up a working instance and then use Amazons "save as new AMI", but that has some major limitations, especially with software not expecting it. ("There are two of me now!") Worker nodes might cope better than the master.

But yes, I also would love new AMIs that don't pull down 200 meg every time I spin up. 
("Spin up a cluster in five minutes" HA!) Also, AMIs per region are also good for costs. I've thought of doing up new ones, (since I have experience) but I have no time and other issues first. Perhaps once I know Spark better.

At least with spark, we have more control over the scripts exactly because they are "Primitive". I had a quick look at YARN/Ambari, and it wasn't obvious they were any better with EC2, and a hundred times the complexity.

I expect most AWS-heavy companies have a full time person just managing AMIs. They are that annoying. It's what makes Cloudera attractive.

Jeremy Lee   BCompSci (Hons)
The Unorthodox Engineers

> On 6 Jun 2014, at 6:44 am, Matt Work Coarr <ma...@gmail.com> wrote:
> 
> How would I go about creating a new AMI image that I can use with the spark ec2 commands? I can't seem to find any documentation.  I'm looking for a list of steps that I'd need to perform to make an Amazon Linux image ready to be used by the spark ec2 tools.
> 
> I've been reading through the spark 1.0.0 documentation, looking at the script itself (spark_ec2.py), and looking at the github project mesos/spark-ec2.
> 
> From what I can tell, the spark_ec2.py script looks up the id of the AMI based on the region and machine type (hvm or pvm) using static content derived from the github repo mesos/spark-ec2.
> 
> The spark ec2 script loads the AMI id from this base url:
> https://raw.github.com/mesos/spark-ec2/v2/ami-list
> (Which presumably comes from https://github.com/mesos/spark-ec2 )
> 
> For instance, I'm working with us-east-1 and pvm, I'd end up with AMI id:
> ami-5bb18832
> 
> Is there a list of instructions for how this AMI was created?  Assuming I'm starting with my own Amazon Linux image, what would I need to do to make it usable where I could pass that AMI id to spark_ec2.py rather than using the default spark-provided AMI?
> 
> Thanks,
> Matt