You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Nicholas Chammas <ni...@gmail.com> on 2014/10/03 01:30:51 UTC

Re: EC2 clusters ready in launch time + 30 seconds

Is there perhaps a way to define an AMI programmatically? Like, a
collection of base AMI id + list of required stuff to be installed + list
of required configuration changes. I’m guessing that’s what people use
things like Puppet, Ansible, or maybe also AWS CloudFormation for, right?

If we could do something like that, then with every new release of Spark we
could quickly and easily create new AMIs that have everything we need.
spark-ec2 would only have to bring up the instances and do a minimal amount
of configuration, and the only thing we’d need to track in the Spark repo
is the code that defines what goes on the AMI, as well as a list of the AMI
ids specific to each release.

I’m just thinking out loud here. Does this make sense?

Nate,

Any progress on your end with this work?

Nick

On Sun, Jul 13, 2014 at 8:53 PM, Shivaram Venkataraman <
shivaram@eecs.berkeley.edu> wrote:

> It should be possible to improve cluster launch time if we are careful
> about what commands we run during setup. One way to do this would be to
> walk down the list of things we do for cluster initialization and see if
> there is anything we can do make things faster. Unfortunately this might be
> pretty time consuming, but I don't know of a better strategy. The place to
> start would be the setup.sh file at
> https://github.com/mesos/spark-ec2/blob/v3/setup.sh
>
> Here are some things that take a lot of time and could be improved:
> 1. Creating swap partitions on all machines. We could check if there is a
> way to get EC2 to always mount a swap partition
> 2. Copying / syncing things across slaves. The copy-dir script is called
> too many times right now and each time it pauses for a few milliseconds
> between slaves [1]. This could be improved by removing unnecessary copies
> 3. We could make less frequently used modules like Tachyon, persistent hdfs
> not a part of the default setup.
>
> [1] https://github.com/mesos/spark-ec2/blob/v3/copy-dir.sh#L42
>
> Thanks
> Shivaram
>
>
>
>
> On Sat, Jul 12, 2014 at 7:02 PM, Nicholas Chammas <
> nicholas.chammas@gmail.com> wrote:
>
> > On Thu, Jul 10, 2014 at 8:10 PM, Nate D'Amico <na...@reactor8.com> wrote:
> >
> > > Starting to work through some automation/config stuff for spark stack
> on
> > > EC2 with a project, will be focusing the work through the apache bigtop
> > > effort to start, can then share with spark community directly as things
> > > progress if people are interested
> >
> >
> > Let us know how that goes. I'm definitely interested in hearing more.
> >
> > Nick
> >
>

Re: EC2 clusters ready in launch time + 30 seconds

Posted by Nicholas Chammas <ni...@gmail.com>.

FYI: I've created SPARK-3821: Develop an automated way of creating Spark
images (AMI, Docker, and others)
<https://issues.apache.org/jira/browse/SPARK-3821>

On Mon, Oct 6, 2014 at 4:48 PM, Daniil Osipov <da...@shazam.com>
wrote:

> I've also been looking at this. Basically, the Spark EC2 script is
> excellent for small development clusters of several nodes, but isn't
> suitable for production. It handles instance setup in a single threaded
> manner, while it can easily be parallelized. It also doesn't handle failure
> well, ex when an instance fails to start or is taking too long to respond.
>
> Our desire was to have an equivalent to Amazon EMR[1] API that would
> trigger Spark jobs, including specified cluster setup. I've done some work
> towards that end, and it would benefit from an updated AMI greatly.
>
> Dan
>
> [1]
> http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-cli-commands.html
>
> On Sat, Oct 4, 2014 at 7:28 AM, Nicholas Chammas <
> nicholas.chammas@gmail.com> wrote:
>
>> Thanks for posting that script, Patrick. It looks like a good place to
>> start.
>>
>> Regarding Docker vs. Packer, as I understand it you can use Packer to
>> create Docker containers at the same time as AMIs and other image types.
>>
>> Nick
>>
>>
>> On Sat, Oct 4, 2014 at 2:49 AM, Patrick Wendell <pw...@gmail.com>
>> wrote:
>>
>> > Hey All,
>> >
>> > Just a couple notes. I recently posted a shell script for creating the
>> > AMI's from a clean Amazon Linux AMI.
>> >
>> > https://github.com/mesos/spark-ec2/blob/v3/create_image.sh
>> >
>> > I think I will update the AMI's soon to get the most recent security
>> > updates. For spark-ec2's purpose this is probably sufficient (we'll
>> > only need to re-create them every few months).
>> >
>> > However, it would be cool if someone wanted to tackle providing a more
>> > general mechanism for defining Spark-friendly "images" that can be
>> > used more generally. I had thought that docker might be a good way to
>> > go for something like this - but maybe this packer thing is good too.
>> >
>> > For one thing, if we had a standard image we could use it to create
>> > containers for running Spark's unit test, which would be really cool.
>> > This would help a lot with random issues around port and filesystem
>> > contention we have for unit tests.
>> >
>> > I'm not sure if the long term place for this would be inside the spark
>> > codebase or a community library or what. But it would definitely be
>> > very valuable to have if someone wanted to take it on.
>> >
>> > - Patrick
>> >
>> > On Fri, Oct 3, 2014 at 5:20 PM, Nicholas Chammas
>> > <ni...@gmail.com> wrote:
>> > > FYI: There is an existing issue -- SPARK-3314
>> > > <https://issues.apache.org/jira/browse/SPARK-3314> -- about scripting
>> > the
>> > > creation of Spark AMIs.
>> > >
>> > > With Packer, it looks like we may be able to script the creation of
>> > > multiple image types (VMWare, GCE, AMI, Docker, etc...) at once from a
>> > > single Packer template. That's very cool.
>> > >
>> > > I'll be looking into this.
>> > >
>> > > Nick
>> > >
>> > >
>> > > On Thu, Oct 2, 2014 at 8:23 PM, Nicholas Chammas <
>> > nicholas.chammas@gmail.com
>> > >> wrote:
>> > >
>> > >> Thanks for the update, Nate. I'm looking forward to seeing how these
>> > >> projects turn out.
>> > >>
>> > >> David, Packer looks very, very interesting. I'm gonna look into it
>> more
>> > >> next week.
>> > >>
>> > >> Nick
>> > >>
>> > >>
>> > >> On Thu, Oct 2, 2014 at 8:00 PM, Nate D'Amico <na...@reactor8.com>
>> wrote:
>> > >>
>> > >>> Bit of progress on our end, bit of lagging as well.  Our guy leading
>> > >>> effort got little bogged down on client project to update hive/sql
>> > testbed
>> > >>> to latest spark/sparkSQL, also launching public service so we have
>> > been bit
>> > >>> scattered recently.
>> > >>>
>> > >>> Will have some more updates probably after next week.  We are
>> planning
>> > on
>> > >>> taking our client work around hive/spark, plus taking over the
>> bigtop
>> > >>> automation work to modernize and get that fit for human consumption
>> > outside
>> > >>> or org.  All our work and puppet modules will be open sourced,
>> > documented,
>> > >>> hopefully start to rally some other folks around effort that find it
>> > useful
>> > >>>
>> > >>> Side note, another effort we are looking into is gradle
>> tests/support.
>> > >>> We have been leveraging serverspec for some basic infrastructure
>> > tests, but
>> > >>> with bigtop switching over to gradle builds/testing setup in 0.8 we
>> > want to
>> > >>> include support for that in our own efforts, probably some stuff
>> that
>> > can
>> > >>> be learned and leveraged in spark world for repeatable/tested
>> > infrastructure
>> > >>>
>> > >>> If anyone has any specific automation questions to your environment
>> you
>> > >>> can drop me a line directly.., will try to help out best I can.
>> Else
>> > will
>> > >>> post update to dev list once we get on top of our own product
>> release
>> > and
>> > >>> the bigtop work
>> > >>>
>> > >>> Nate
>> > >>>
>> > >>>
>> > >>> -----Original Message-----
>> > >>> From: David Rowe [mailto:davidrowe@gmail.com]
>> > >>> Sent: Thursday, October 02, 2014 4:44 PM
>> > >>> To: Nicholas Chammas
>> > >>> Cc: dev; Shivaram Venkataraman
>> > >>> Subject: Re: EC2 clusters ready in launch time + 30 seconds
>> > >>>
>> > >>> I think this is exactly what packer is for. See e.g.
>> > >>> http://www.packer.io/intro/getting-started/build-image.html
>> > >>>
>> > >>> On a related note, the current AMI for hvm systems (e.g. m3.*, r3.*)
>> > has
>> > >>> a bad package for httpd, whcih causes ganglia not to start. For some
>> > reason
>> > >>> I can't get access to the raw AMI to fix it.
>> > >>>
>> > >>> On Fri, Oct 3, 2014 at 9:30 AM, Nicholas Chammas <
>> > >>> nicholas.chammas@gmail.com
>> > >>> > wrote:
>> > >>>
>> > >>> > Is there perhaps a way to define an AMI programmatically? Like, a
>> > >>> > collection of base AMI id + list of required stuff to be
>> installed +
>> > >>> > list of required configuration changes. I'm guessing that's what
>> > >>> > people use things like Puppet, Ansible, or maybe also AWS
>> > >>> CloudFormation for, right?
>> > >>> >
>> > >>> > If we could do something like that, then with every new release of
>> > >>> > Spark we could quickly and easily create new AMIs that have
>> > everything
>> > >>> we need.
>> > >>> > spark-ec2 would only have to bring up the instances and do a
>> minimal
>> > >>> > amount of configuration, and the only thing we'd need to track in
>> the
>> > >>> > Spark repo is the code that defines what goes on the AMI, as well
>> as
>> > a
>> > >>> > list of the AMI ids specific to each release.
>> > >>> >
>> > >>> > I'm just thinking out loud here. Does this make sense?
>> > >>> >
>> > >>> > Nate,
>> > >>> >
>> > >>> > Any progress on your end with this work?
>> > >>> >
>> > >>> > Nick
>> > >>> >
>> > >>> >
>> > >>> > On Sun, Jul 13, 2014 at 8:53 PM, Shivaram Venkataraman <
>> > >>> > shivaram@eecs.berkeley.edu> wrote:
>> > >>> >
>> > >>> > > It should be possible to improve cluster launch time if we are
>> > >>> > > careful about what commands we run during setup. One way to do
>> this
>> > >>> > > would be to walk down the list of things we do for cluster
>> > >>> > > initialization and see if there is anything we can do make
>> things
>> > >>> > > faster. Unfortunately this might
>> > >>> > be
>> > >>> > > pretty time consuming, but I don't know of a better strategy.
>> The
>> > >>> > > place
>> > >>> > to
>> > >>> > > start would be the setup.sh file at
>> > >>> > > https://github.com/mesos/spark-ec2/blob/v3/setup.sh
>> > >>> > >
>> > >>> > > Here are some things that take a lot of time and could be
>> improved:
>> > >>> > > 1. Creating swap partitions on all machines. We could check if
>> > there
>> > >>> > > is a way to get EC2 to always mount a swap partition 2. Copying
>> /
>> > >>> > > syncing things across slaves. The copy-dir script is called too
>> > many
>> > >>> > > times right now and each time it pauses for a few milliseconds
>> > >>> > > between slaves [1]. This could be improved by removing
>> unnecessary
>> > >>> > > copies 3. We could make less frequently used modules like
>> Tachyon,
>> > >>> > > persistent
>> > >>> > hdfs
>> > >>> > > not a part of the default setup.
>> > >>> > >
>> > >>> > > [1] https://github.com/mesos/spark-ec2/blob/v3/copy-dir.sh#L42
>> > >>> > >
>> > >>> > > Thanks
>> > >>> > > Shivaram
>> > >>> > >
>> > >>> > >
>> > >>> > >
>> > >>> > >
>> > >>> > > On Sat, Jul 12, 2014 at 7:02 PM, Nicholas Chammas <
>> > >>> > > nicholas.chammas@gmail.com> wrote:
>> > >>> > >
>> > >>> > > > On Thu, Jul 10, 2014 at 8:10 PM, Nate D'Amico <
>> nate@reactor8.com
>> > >
>> > >>> > wrote:
>> > >>> > > >
>> > >>> > > > > Starting to work through some automation/config stuff for
>> spark
>> > >>> > > > > stack
>> > >>> > > on
>> > >>> > > > > EC2 with a project, will be focusing the work through the
>> > apache
>> > >>> > bigtop
>> > >>> > > > > effort to start, can then share with spark community
>> directly
>> > as
>> > >>> > things
>> > >>> > > > > progress if people are interested
>> > >>> > > >
>> > >>> > > >
>> > >>> > > > Let us know how that goes. I'm definitely interested in
>> hearing
>> > >>> more.
>> > >>> > > >
>> > >>> > > > Nick
>> > >>> > > >
>> > >>> > >
>> > >>> >
>> > >>>
>> > >>>
>> > >>
>> >
>>
>
>

Re: EC2 clusters ready in launch time + 30 seconds

Posted by Nicholas Chammas <ni...@gmail.com>.

I've posted
<https://issues.apache.org/jira/browse/SPARK-3821?focusedCommentId=14203280&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14203280>
an initial proposal and implementation of using Packer to automate
generating Spark AMIs to SPARK-3821
<https://issues.apache.org/jira/browse/SPARK-3821>.

On Mon, Oct 6, 2014 at 7:40 PM, David Rowe <da...@gmail.com> wrote:

> I agree with this - there is also the issue of different sized masters and
> slaves, and numbers of executors for hefty machines (e.g. r3.8xlarges),
> tagging of instances and volumes (we use this for cost attribution at my
> workplace), and running in VPCs.
>
> I think think it might be useful to take a layered approach: the first
> step could be getting a good reliable image produced - Nick's ticket - then
> doing some work on the launch script.
>
> Regarding the EMR like service - I think I heard that AWS is planning to
> add spark support to EMR, but as usual there's nothing firm until it's
> released.
>
>
> On Tue, Oct 7, 2014 at 7:48 AM, Daniil Osipov <da...@shazam.com>
> wrote:
>
>> I've also been looking at this. Basically, the Spark EC2 script is
>> excellent for small development clusters of several nodes, but isn't
>> suitable for production. It handles instance setup in a single threaded
>> manner, while it can easily be parallelized. It also doesn't handle
>> failure
>> well, ex when an instance fails to start or is taking too long to respond.
>>
>> Our desire was to have an equivalent to Amazon EMR[1] API that would
>> trigger Spark jobs, including specified cluster setup. I've done some work
>> towards that end, and it would benefit from an updated AMI greatly.
>>
>> Dan
>>
>> [1]
>>
>> http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-cli-commands.html
>>
>> On Sat, Oct 4, 2014 at 7:28 AM, Nicholas Chammas <
>> nicholas.chammas@gmail.com
>> > wrote:
>>
>> > Thanks for posting that script, Patrick. It looks like a good place to
>> > start.
>> >
>> > Regarding Docker vs. Packer, as I understand it you can use Packer to
>> > create Docker containers at the same time as AMIs and other image types.
>> >
>> > Nick
>> >
>> >
>> > On Sat, Oct 4, 2014 at 2:49 AM, Patrick Wendell <pw...@gmail.com>
>> > wrote:
>> >
>> > > Hey All,
>> > >
>> > > Just a couple notes. I recently posted a shell script for creating the
>> > > AMI's from a clean Amazon Linux AMI.
>> > >
>> > > https://github.com/mesos/spark-ec2/blob/v3/create_image.sh
>> > >
>> > > I think I will update the AMI's soon to get the most recent security
>> > > updates. For spark-ec2's purpose this is probably sufficient (we'll
>> > > only need to re-create them every few months).
>> > >
>> > > However, it would be cool if someone wanted to tackle providing a more
>> > > general mechanism for defining Spark-friendly "images" that can be
>> > > used more generally. I had thought that docker might be a good way to
>> > > go for something like this - but maybe this packer thing is good too.
>> > >
>> > > For one thing, if we had a standard image we could use it to create
>> > > containers for running Spark's unit test, which would be really cool.
>> > > This would help a lot with random issues around port and filesystem
>> > > contention we have for unit tests.
>> > >
>> > > I'm not sure if the long term place for this would be inside the spark
>> > > codebase or a community library or what. But it would definitely be
>> > > very valuable to have if someone wanted to take it on.
>> > >
>> > > - Patrick
>> > >
>> > > On Fri, Oct 3, 2014 at 5:20 PM, Nicholas Chammas
>> > > <ni...@gmail.com> wrote:
>> > > > FYI: There is an existing issue -- SPARK-3314
>> > > > <https://issues.apache.org/jira/browse/SPARK-3314> -- about
>> scripting
>> > > the
>> > > > creation of Spark AMIs.
>> > > >
>> > > > With Packer, it looks like we may be able to script the creation of
>> > > > multiple image types (VMWare, GCE, AMI, Docker, etc...) at once
>> from a
>> > > > single Packer template. That's very cool.
>> > > >
>> > > > I'll be looking into this.
>> > > >
>> > > > Nick
>> > > >
>> > > >
>> > > > On Thu, Oct 2, 2014 at 8:23 PM, Nicholas Chammas <
>> > > nicholas.chammas@gmail.com
>> > > >> wrote:
>> > > >
>> > > >> Thanks for the update, Nate. I'm looking forward to seeing how
>> these
>> > > >> projects turn out.
>> > > >>
>> > > >> David, Packer looks very, very interesting. I'm gonna look into it
>> > more
>> > > >> next week.
>> > > >>
>> > > >> Nick
>> > > >>
>> > > >>
>> > > >> On Thu, Oct 2, 2014 at 8:00 PM, Nate D'Amico <na...@reactor8.com>
>> > wrote:
>> > > >>
>> > > >>> Bit of progress on our end, bit of lagging as well.  Our guy
>> leading
>> > > >>> effort got little bogged down on client project to update hive/sql
>> > > testbed
>> > > >>> to latest spark/sparkSQL, also launching public service so we have
>> > > been bit
>> > > >>> scattered recently.
>> > > >>>
>> > > >>> Will have some more updates probably after next week.  We are
>> > planning
>> > > on
>> > > >>> taking our client work around hive/spark, plus taking over the
>> bigtop
>> > > >>> automation work to modernize and get that fit for human
>> consumption
>> > > outside
>> > > >>> or org.  All our work and puppet modules will be open sourced,
>> > > documented,
>> > > >>> hopefully start to rally some other folks around effort that find
>> it
>> > > useful
>> > > >>>
>> > > >>> Side note, another effort we are looking into is gradle
>> > tests/support.
>> > > >>> We have been leveraging serverspec for some basic infrastructure
>> > > tests, but
>> > > >>> with bigtop switching over to gradle builds/testing setup in 0.8
>> we
>> > > want to
>> > > >>> include support for that in our own efforts, probably some stuff
>> that
>> > > can
>> > > >>> be learned and leveraged in spark world for repeatable/tested
>> > > infrastructure
>> > > >>>
>> > > >>> If anyone has any specific automation questions to your
>> environment
>> > you
>> > > >>> can drop me a line directly.., will try to help out best I can.
>> Else
>> > > will
>> > > >>> post update to dev list once we get on top of our own product
>> release
>> > > and
>> > > >>> the bigtop work
>> > > >>>
>> > > >>> Nate
>> > > >>>
>> > > >>>
>> > > >>> -----Original Message-----
>> > > >>> From: David Rowe [mailto:davidrowe@gmail.com]
>> > > >>> Sent: Thursday, October 02, 2014 4:44 PM
>> > > >>> To: Nicholas Chammas
>> > > >>> Cc: dev; Shivaram Venkataraman
>> > > >>> Subject: Re: EC2 clusters ready in launch time + 30 seconds
>> > > >>>
>> > > >>> I think this is exactly what packer is for. See e.g.
>> > > >>> http://www.packer.io/intro/getting-started/build-image.html
>> > > >>>
>> > > >>> On a related note, the current AMI for hvm systems (e.g. m3.*,
>> r3.*)
>> > > has
>> > > >>> a bad package for httpd, whcih causes ganglia not to start. For
>> some
>> > > reason
>> > > >>> I can't get access to the raw AMI to fix it.
>> > > >>>
>> > > >>> On Fri, Oct 3, 2014 at 9:30 AM, Nicholas Chammas <
>> > > >>> nicholas.chammas@gmail.com
>> > > >>> > wrote:
>> > > >>>
>> > > >>> > Is there perhaps a way to define an AMI programmatically? Like,
>> a
>> > > >>> > collection of base AMI id + list of required stuff to be
>> installed
>> > +
>> > > >>> > list of required configuration changes. I'm guessing that's what
>> > > >>> > people use things like Puppet, Ansible, or maybe also AWS
>> > > >>> CloudFormation for, right?
>> > > >>> >
>> > > >>> > If we could do something like that, then with every new release
>> of
>> > > >>> > Spark we could quickly and easily create new AMIs that have
>> > > everything
>> > > >>> we need.
>> > > >>> > spark-ec2 would only have to bring up the instances and do a
>> > minimal
>> > > >>> > amount of configuration, and the only thing we'd need to track
>> in
>> > the
>> > > >>> > Spark repo is the code that defines what goes on the AMI, as
>> well
>> > as
>> > > a
>> > > >>> > list of the AMI ids specific to each release.
>> > > >>> >
>> > > >>> > I'm just thinking out loud here. Does this make sense?
>> > > >>> >
>> > > >>> > Nate,
>> > > >>> >
>> > > >>> > Any progress on your end with this work?
>> > > >>> >
>> > > >>> > Nick
>> > > >>> >
>> > > >>> >
>> > > >>> > On Sun, Jul 13, 2014 at 8:53 PM, Shivaram Venkataraman <
>> > > >>> > shivaram@eecs.berkeley.edu> wrote:
>> > > >>> >
>> > > >>> > > It should be possible to improve cluster launch time if we are
>> > > >>> > > careful about what commands we run during setup. One way to do
>> > this
>> > > >>> > > would be to walk down the list of things we do for cluster
>> > > >>> > > initialization and see if there is anything we can do make
>> things
>> > > >>> > > faster. Unfortunately this might
>> > > >>> > be
>> > > >>> > > pretty time consuming, but I don't know of a better strategy.
>> The
>> > > >>> > > place
>> > > >>> > to
>> > > >>> > > start would be the setup.sh file at
>> > > >>> > > https://github.com/mesos/spark-ec2/blob/v3/setup.sh
>> > > >>> > >
>> > > >>> > > Here are some things that take a lot of time and could be
>> > improved:
>> > > >>> > > 1. Creating swap partitions on all machines. We could check if
>> > > there
>> > > >>> > > is a way to get EC2 to always mount a swap partition 2.
>> Copying /
>> > > >>> > > syncing things across slaves. The copy-dir script is called
>> too
>> > > many
>> > > >>> > > times right now and each time it pauses for a few milliseconds
>> > > >>> > > between slaves [1]. This could be improved by removing
>> > unnecessary
>> > > >>> > > copies 3. We could make less frequently used modules like
>> > Tachyon,
>> > > >>> > > persistent
>> > > >>> > hdfs
>> > > >>> > > not a part of the default setup.
>> > > >>> > >
>> > > >>> > > [1]
>> https://github.com/mesos/spark-ec2/blob/v3/copy-dir.sh#L42
>> > > >>> > >
>> > > >>> > > Thanks
>> > > >>> > > Shivaram
>> > > >>> > >
>> > > >>> > >
>> > > >>> > >
>> > > >>> > >
>> > > >>> > > On Sat, Jul 12, 2014 at 7:02 PM, Nicholas Chammas <
>> > > >>> > > nicholas.chammas@gmail.com> wrote:
>> > > >>> > >
>> > > >>> > > > On Thu, Jul 10, 2014 at 8:10 PM, Nate D'Amico <
>> > nate@reactor8.com
>> > > >
>> > > >>> > wrote:
>> > > >>> > > >
>> > > >>> > > > > Starting to work through some automation/config stuff for
>> > spark
>> > > >>> > > > > stack
>> > > >>> > > on
>> > > >>> > > > > EC2 with a project, will be focusing the work through the
>> > > apache
>> > > >>> > bigtop
>> > > >>> > > > > effort to start, can then share with spark community
>> directly
>> > > as
>> > > >>> > things
>> > > >>> > > > > progress if people are interested
>> > > >>> > > >
>> > > >>> > > >
>> > > >>> > > > Let us know how that goes. I'm definitely interested in
>> hearing
>> > > >>> more.
>> > > >>> > > >
>> > > >>> > > > Nick
>> > > >>> > > >
>> > > >>> > >
>> > > >>> >
>> > > >>>
>> > > >>>
>> > > >>
>> > >
>> >
>>
>
>

Re: EC2 clusters ready in launch time + 30 seconds

Posted by David Rowe <da...@gmail.com>.

I agree with this - there is also the issue of different sized masters and
slaves, and numbers of executors for hefty machines (e.g. r3.8xlarges),
tagging of instances and volumes (we use this for cost attribution at my
workplace), and running in VPCs.

I think think it might be useful to take a layered approach: the first step
could be getting a good reliable image produced - Nick's ticket - then
doing some work on the launch script.

Regarding the EMR like service - I think I heard that AWS is planning to
add spark support to EMR, but as usual there's nothing firm until it's
released.


On Tue, Oct 7, 2014 at 7:48 AM, Daniil Osipov <da...@shazam.com>
wrote:

> I've also been looking at this. Basically, the Spark EC2 script is
> excellent for small development clusters of several nodes, but isn't
> suitable for production. It handles instance setup in a single threaded
> manner, while it can easily be parallelized. It also doesn't handle failure
> well, ex when an instance fails to start or is taking too long to respond.
>
> Our desire was to have an equivalent to Amazon EMR[1] API that would
> trigger Spark jobs, including specified cluster setup. I've done some work
> towards that end, and it would benefit from an updated AMI greatly.
>
> Dan
>
> [1]
>
> http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-cli-commands.html
>
> On Sat, Oct 4, 2014 at 7:28 AM, Nicholas Chammas <
> nicholas.chammas@gmail.com
> > wrote:
>
> > Thanks for posting that script, Patrick. It looks like a good place to
> > start.
> >
> > Regarding Docker vs. Packer, as I understand it you can use Packer to
> > create Docker containers at the same time as AMIs and other image types.
> >
> > Nick
> >
> >
> > On Sat, Oct 4, 2014 at 2:49 AM, Patrick Wendell <pw...@gmail.com>
> > wrote:
> >
> > > Hey All,
> > >
> > > Just a couple notes. I recently posted a shell script for creating the
> > > AMI's from a clean Amazon Linux AMI.
> > >
> > > https://github.com/mesos/spark-ec2/blob/v3/create_image.sh
> > >
> > > I think I will update the AMI's soon to get the most recent security
> > > updates. For spark-ec2's purpose this is probably sufficient (we'll
> > > only need to re-create them every few months).
> > >
> > > However, it would be cool if someone wanted to tackle providing a more
> > > general mechanism for defining Spark-friendly "images" that can be
> > > used more generally. I had thought that docker might be a good way to
> > > go for something like this - but maybe this packer thing is good too.
> > >
> > > For one thing, if we had a standard image we could use it to create
> > > containers for running Spark's unit test, which would be really cool.
> > > This would help a lot with random issues around port and filesystem
> > > contention we have for unit tests.
> > >
> > > I'm not sure if the long term place for this would be inside the spark
> > > codebase or a community library or what. But it would definitely be
> > > very valuable to have if someone wanted to take it on.
> > >
> > > - Patrick
> > >
> > > On Fri, Oct 3, 2014 at 5:20 PM, Nicholas Chammas
> > > <ni...@gmail.com> wrote:
> > > > FYI: There is an existing issue -- SPARK-3314
> > > > <https://issues.apache.org/jira/browse/SPARK-3314> -- about
> scripting
> > > the
> > > > creation of Spark AMIs.
> > > >
> > > > With Packer, it looks like we may be able to script the creation of
> > > > multiple image types (VMWare, GCE, AMI, Docker, etc...) at once from
> a
> > > > single Packer template. That's very cool.
> > > >
> > > > I'll be looking into this.
> > > >
> > > > Nick
> > > >
> > > >
> > > > On Thu, Oct 2, 2014 at 8:23 PM, Nicholas Chammas <
> > > nicholas.chammas@gmail.com
> > > >> wrote:
> > > >
> > > >> Thanks for the update, Nate. I'm looking forward to seeing how these
> > > >> projects turn out.
> > > >>
> > > >> David, Packer looks very, very interesting. I'm gonna look into it
> > more
> > > >> next week.
> > > >>
> > > >> Nick
> > > >>
> > > >>
> > > >> On Thu, Oct 2, 2014 at 8:00 PM, Nate D'Amico <na...@reactor8.com>
> > wrote:
> > > >>
> > > >>> Bit of progress on our end, bit of lagging as well.  Our guy
> leading
> > > >>> effort got little bogged down on client project to update hive/sql
> > > testbed
> > > >>> to latest spark/sparkSQL, also launching public service so we have
> > > been bit
> > > >>> scattered recently.
> > > >>>
> > > >>> Will have some more updates probably after next week.  We are
> > planning
> > > on
> > > >>> taking our client work around hive/spark, plus taking over the
> bigtop
> > > >>> automation work to modernize and get that fit for human consumption
> > > outside
> > > >>> or org.  All our work and puppet modules will be open sourced,
> > > documented,
> > > >>> hopefully start to rally some other folks around effort that find
> it
> > > useful
> > > >>>
> > > >>> Side note, another effort we are looking into is gradle
> > tests/support.
> > > >>> We have been leveraging serverspec for some basic infrastructure
> > > tests, but
> > > >>> with bigtop switching over to gradle builds/testing setup in 0.8 we
> > > want to
> > > >>> include support for that in our own efforts, probably some stuff
> that
> > > can
> > > >>> be learned and leveraged in spark world for repeatable/tested
> > > infrastructure
> > > >>>
> > > >>> If anyone has any specific automation questions to your environment
> > you
> > > >>> can drop me a line directly.., will try to help out best I can.
> Else
> > > will
> > > >>> post update to dev list once we get on top of our own product
> release
> > > and
> > > >>> the bigtop work
> > > >>>
> > > >>> Nate
> > > >>>
> > > >>>
> > > >>> -----Original Message-----
> > > >>> From: David Rowe [mailto:davidrowe@gmail.com]
> > > >>> Sent: Thursday, October 02, 2014 4:44 PM
> > > >>> To: Nicholas Chammas
> > > >>> Cc: dev; Shivaram Venkataraman
> > > >>> Subject: Re: EC2 clusters ready in launch time + 30 seconds
> > > >>>
> > > >>> I think this is exactly what packer is for. See e.g.
> > > >>> http://www.packer.io/intro/getting-started/build-image.html
> > > >>>
> > > >>> On a related note, the current AMI for hvm systems (e.g. m3.*,
> r3.*)
> > > has
> > > >>> a bad package for httpd, whcih causes ganglia not to start. For
> some
> > > reason
> > > >>> I can't get access to the raw AMI to fix it.
> > > >>>
> > > >>> On Fri, Oct 3, 2014 at 9:30 AM, Nicholas Chammas <
> > > >>> nicholas.chammas@gmail.com
> > > >>> > wrote:
> > > >>>
> > > >>> > Is there perhaps a way to define an AMI programmatically? Like, a
> > > >>> > collection of base AMI id + list of required stuff to be
> installed
> > +
> > > >>> > list of required configuration changes. I'm guessing that's what
> > > >>> > people use things like Puppet, Ansible, or maybe also AWS
> > > >>> CloudFormation for, right?
> > > >>> >
> > > >>> > If we could do something like that, then with every new release
> of
> > > >>> > Spark we could quickly and easily create new AMIs that have
> > > everything
> > > >>> we need.
> > > >>> > spark-ec2 would only have to bring up the instances and do a
> > minimal
> > > >>> > amount of configuration, and the only thing we'd need to track in
> > the
> > > >>> > Spark repo is the code that defines what goes on the AMI, as well
> > as
> > > a
> > > >>> > list of the AMI ids specific to each release.
> > > >>> >
> > > >>> > I'm just thinking out loud here. Does this make sense?
> > > >>> >
> > > >>> > Nate,
> > > >>> >
> > > >>> > Any progress on your end with this work?
> > > >>> >
> > > >>> > Nick
> > > >>> >
> > > >>> >
> > > >>> > On Sun, Jul 13, 2014 at 8:53 PM, Shivaram Venkataraman <
> > > >>> > shivaram@eecs.berkeley.edu> wrote:
> > > >>> >
> > > >>> > > It should be possible to improve cluster launch time if we are
> > > >>> > > careful about what commands we run during setup. One way to do
> > this
> > > >>> > > would be to walk down the list of things we do for cluster
> > > >>> > > initialization and see if there is anything we can do make
> things
> > > >>> > > faster. Unfortunately this might
> > > >>> > be
> > > >>> > > pretty time consuming, but I don't know of a better strategy.
> The
> > > >>> > > place
> > > >>> > to
> > > >>> > > start would be the setup.sh file at
> > > >>> > > https://github.com/mesos/spark-ec2/blob/v3/setup.sh
> > > >>> > >
> > > >>> > > Here are some things that take a lot of time and could be
> > improved:
> > > >>> > > 1. Creating swap partitions on all machines. We could check if
> > > there
> > > >>> > > is a way to get EC2 to always mount a swap partition 2.
> Copying /
> > > >>> > > syncing things across slaves. The copy-dir script is called too
> > > many
> > > >>> > > times right now and each time it pauses for a few milliseconds
> > > >>> > > between slaves [1]. This could be improved by removing
> > unnecessary
> > > >>> > > copies 3. We could make less frequently used modules like
> > Tachyon,
> > > >>> > > persistent
> > > >>> > hdfs
> > > >>> > > not a part of the default setup.
> > > >>> > >
> > > >>> > > [1] https://github.com/mesos/spark-ec2/blob/v3/copy-dir.sh#L42
> > > >>> > >
> > > >>> > > Thanks
> > > >>> > > Shivaram
> > > >>> > >
> > > >>> > >
> > > >>> > >
> > > >>> > >
> > > >>> > > On Sat, Jul 12, 2014 at 7:02 PM, Nicholas Chammas <
> > > >>> > > nicholas.chammas@gmail.com> wrote:
> > > >>> > >
> > > >>> > > > On Thu, Jul 10, 2014 at 8:10 PM, Nate D'Amico <
> > nate@reactor8.com
> > > >
> > > >>> > wrote:
> > > >>> > > >
> > > >>> > > > > Starting to work through some automation/config stuff for
> > spark
> > > >>> > > > > stack
> > > >>> > > on
> > > >>> > > > > EC2 with a project, will be focusing the work through the
> > > apache
> > > >>> > bigtop
> > > >>> > > > > effort to start, can then share with spark community
> directly
> > > as
> > > >>> > things
> > > >>> > > > > progress if people are interested
> > > >>> > > >
> > > >>> > > >
> > > >>> > > > Let us know how that goes. I'm definitely interested in
> hearing
> > > >>> more.
> > > >>> > > >
> > > >>> > > > Nick
> > > >>> > > >
> > > >>> > >
> > > >>> >
> > > >>>
> > > >>>
> > > >>
> > >
> >
>

Re: EC2 clusters ready in launch time + 30 seconds

Posted by Daniil Osipov <da...@shazam.com>.

I've also been looking at this. Basically, the Spark EC2 script is
excellent for small development clusters of several nodes, but isn't
suitable for production. It handles instance setup in a single threaded
manner, while it can easily be parallelized. It also doesn't handle failure
well, ex when an instance fails to start or is taking too long to respond.

Our desire was to have an equivalent to Amazon EMR[1] API that would
trigger Spark jobs, including specified cluster setup. I've done some work
towards that end, and it would benefit from an updated AMI greatly.

Dan

[1]
http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-cli-commands.html

On Sat, Oct 4, 2014 at 7:28 AM, Nicholas Chammas <nicholas.chammas@gmail.com
> wrote:

> Thanks for posting that script, Patrick. It looks like a good place to
> start.
>
> Regarding Docker vs. Packer, as I understand it you can use Packer to
> create Docker containers at the same time as AMIs and other image types.
>
> Nick
>
>
> On Sat, Oct 4, 2014 at 2:49 AM, Patrick Wendell <pw...@gmail.com>
> wrote:
>
> > Hey All,
> >
> > Just a couple notes. I recently posted a shell script for creating the
> > AMI's from a clean Amazon Linux AMI.
> >
> > https://github.com/mesos/spark-ec2/blob/v3/create_image.sh
> >
> > I think I will update the AMI's soon to get the most recent security
> > updates. For spark-ec2's purpose this is probably sufficient (we'll
> > only need to re-create them every few months).
> >
> > However, it would be cool if someone wanted to tackle providing a more
> > general mechanism for defining Spark-friendly "images" that can be
> > used more generally. I had thought that docker might be a good way to
> > go for something like this - but maybe this packer thing is good too.
> >
> > For one thing, if we had a standard image we could use it to create
> > containers for running Spark's unit test, which would be really cool.
> > This would help a lot with random issues around port and filesystem
> > contention we have for unit tests.
> >
> > I'm not sure if the long term place for this would be inside the spark
> > codebase or a community library or what. But it would definitely be
> > very valuable to have if someone wanted to take it on.
> >
> > - Patrick
> >
> > On Fri, Oct 3, 2014 at 5:20 PM, Nicholas Chammas
> > <ni...@gmail.com> wrote:
> > > FYI: There is an existing issue -- SPARK-3314
> > > <https://issues.apache.org/jira/browse/SPARK-3314> -- about scripting
> > the
> > > creation of Spark AMIs.
> > >
> > > With Packer, it looks like we may be able to script the creation of
> > > multiple image types (VMWare, GCE, AMI, Docker, etc...) at once from a
> > > single Packer template. That's very cool.
> > >
> > > I'll be looking into this.
> > >
> > > Nick
> > >
> > >
> > > On Thu, Oct 2, 2014 at 8:23 PM, Nicholas Chammas <
> > nicholas.chammas@gmail.com
> > >> wrote:
> > >
> > >> Thanks for the update, Nate. I'm looking forward to seeing how these
> > >> projects turn out.
> > >>
> > >> David, Packer looks very, very interesting. I'm gonna look into it
> more
> > >> next week.
> > >>
> > >> Nick
> > >>
> > >>
> > >> On Thu, Oct 2, 2014 at 8:00 PM, Nate D'Amico <na...@reactor8.com>
> wrote:
> > >>
> > >>> Bit of progress on our end, bit of lagging as well.  Our guy leading
> > >>> effort got little bogged down on client project to update hive/sql
> > testbed
> > >>> to latest spark/sparkSQL, also launching public service so we have
> > been bit
> > >>> scattered recently.
> > >>>
> > >>> Will have some more updates probably after next week.  We are
> planning
> > on
> > >>> taking our client work around hive/spark, plus taking over the bigtop
> > >>> automation work to modernize and get that fit for human consumption
> > outside
> > >>> or org.  All our work and puppet modules will be open sourced,
> > documented,
> > >>> hopefully start to rally some other folks around effort that find it
> > useful
> > >>>
> > >>> Side note, another effort we are looking into is gradle
> tests/support.
> > >>> We have been leveraging serverspec for some basic infrastructure
> > tests, but
> > >>> with bigtop switching over to gradle builds/testing setup in 0.8 we
> > want to
> > >>> include support for that in our own efforts, probably some stuff that
> > can
> > >>> be learned and leveraged in spark world for repeatable/tested
> > infrastructure
> > >>>
> > >>> If anyone has any specific automation questions to your environment
> you
> > >>> can drop me a line directly.., will try to help out best I can.  Else
> > will
> > >>> post update to dev list once we get on top of our own product release
> > and
> > >>> the bigtop work
> > >>>
> > >>> Nate
> > >>>
> > >>>
> > >>> -----Original Message-----
> > >>> From: David Rowe [mailto:davidrowe@gmail.com]
> > >>> Sent: Thursday, October 02, 2014 4:44 PM
> > >>> To: Nicholas Chammas
> > >>> Cc: dev; Shivaram Venkataraman
> > >>> Subject: Re: EC2 clusters ready in launch time + 30 seconds
> > >>>
> > >>> I think this is exactly what packer is for. See e.g.
> > >>> http://www.packer.io/intro/getting-started/build-image.html
> > >>>
> > >>> On a related note, the current AMI for hvm systems (e.g. m3.*, r3.*)
> > has
> > >>> a bad package for httpd, whcih causes ganglia not to start. For some
> > reason
> > >>> I can't get access to the raw AMI to fix it.
> > >>>
> > >>> On Fri, Oct 3, 2014 at 9:30 AM, Nicholas Chammas <
> > >>> nicholas.chammas@gmail.com
> > >>> > wrote:
> > >>>
> > >>> > Is there perhaps a way to define an AMI programmatically? Like, a
> > >>> > collection of base AMI id + list of required stuff to be installed
> +
> > >>> > list of required configuration changes. I'm guessing that's what
> > >>> > people use things like Puppet, Ansible, or maybe also AWS
> > >>> CloudFormation for, right?
> > >>> >
> > >>> > If we could do something like that, then with every new release of
> > >>> > Spark we could quickly and easily create new AMIs that have
> > everything
> > >>> we need.
> > >>> > spark-ec2 would only have to bring up the instances and do a
> minimal
> > >>> > amount of configuration, and the only thing we'd need to track in
> the
> > >>> > Spark repo is the code that defines what goes on the AMI, as well
> as
> > a
> > >>> > list of the AMI ids specific to each release.
> > >>> >
> > >>> > I'm just thinking out loud here. Does this make sense?
> > >>> >
> > >>> > Nate,
> > >>> >
> > >>> > Any progress on your end with this work?
> > >>> >
> > >>> > Nick
> > >>> >
> > >>> >
> > >>> > On Sun, Jul 13, 2014 at 8:53 PM, Shivaram Venkataraman <
> > >>> > shivaram@eecs.berkeley.edu> wrote:
> > >>> >
> > >>> > > It should be possible to improve cluster launch time if we are
> > >>> > > careful about what commands we run during setup. One way to do
> this
> > >>> > > would be to walk down the list of things we do for cluster
> > >>> > > initialization and see if there is anything we can do make things
> > >>> > > faster. Unfortunately this might
> > >>> > be
> > >>> > > pretty time consuming, but I don't know of a better strategy. The
> > >>> > > place
> > >>> > to
> > >>> > > start would be the setup.sh file at
> > >>> > > https://github.com/mesos/spark-ec2/blob/v3/setup.sh
> > >>> > >
> > >>> > > Here are some things that take a lot of time and could be
> improved:
> > >>> > > 1. Creating swap partitions on all machines. We could check if
> > there
> > >>> > > is a way to get EC2 to always mount a swap partition 2. Copying /
> > >>> > > syncing things across slaves. The copy-dir script is called too
> > many
> > >>> > > times right now and each time it pauses for a few milliseconds
> > >>> > > between slaves [1]. This could be improved by removing
> unnecessary
> > >>> > > copies 3. We could make less frequently used modules like
> Tachyon,
> > >>> > > persistent
> > >>> > hdfs
> > >>> > > not a part of the default setup.
> > >>> > >
> > >>> > > [1] https://github.com/mesos/spark-ec2/blob/v3/copy-dir.sh#L42
> > >>> > >
> > >>> > > Thanks
> > >>> > > Shivaram
> > >>> > >
> > >>> > >
> > >>> > >
> > >>> > >
> > >>> > > On Sat, Jul 12, 2014 at 7:02 PM, Nicholas Chammas <
> > >>> > > nicholas.chammas@gmail.com> wrote:
> > >>> > >
> > >>> > > > On Thu, Jul 10, 2014 at 8:10 PM, Nate D'Amico <
> nate@reactor8.com
> > >
> > >>> > wrote:
> > >>> > > >
> > >>> > > > > Starting to work through some automation/config stuff for
> spark
> > >>> > > > > stack
> > >>> > > on
> > >>> > > > > EC2 with a project, will be focusing the work through the
> > apache
> > >>> > bigtop
> > >>> > > > > effort to start, can then share with spark community directly
> > as
> > >>> > things
> > >>> > > > > progress if people are interested
> > >>> > > >
> > >>> > > >
> > >>> > > > Let us know how that goes. I'm definitely interested in hearing
> > >>> more.
> > >>> > > >
> > >>> > > > Nick
> > >>> > > >
> > >>> > >
> > >>> >
> > >>>
> > >>>
> > >>
> >
>

Re: EC2 clusters ready in launch time + 30 seconds

Posted by Nicholas Chammas <ni...@gmail.com>.

Thanks for posting that script, Patrick. It looks like a good place to
start.

Regarding Docker vs. Packer, as I understand it you can use Packer to
create Docker containers at the same time as AMIs and other image types.

Nick


On Sat, Oct 4, 2014 at 2:49 AM, Patrick Wendell <pw...@gmail.com> wrote:

> Hey All,
>
> Just a couple notes. I recently posted a shell script for creating the
> AMI's from a clean Amazon Linux AMI.
>
> https://github.com/mesos/spark-ec2/blob/v3/create_image.sh
>
> I think I will update the AMI's soon to get the most recent security
> updates. For spark-ec2's purpose this is probably sufficient (we'll
> only need to re-create them every few months).
>
> However, it would be cool if someone wanted to tackle providing a more
> general mechanism for defining Spark-friendly "images" that can be
> used more generally. I had thought that docker might be a good way to
> go for something like this - but maybe this packer thing is good too.
>
> For one thing, if we had a standard image we could use it to create
> containers for running Spark's unit test, which would be really cool.
> This would help a lot with random issues around port and filesystem
> contention we have for unit tests.
>
> I'm not sure if the long term place for this would be inside the spark
> codebase or a community library or what. But it would definitely be
> very valuable to have if someone wanted to take it on.
>
> - Patrick
>
> On Fri, Oct 3, 2014 at 5:20 PM, Nicholas Chammas
> <ni...@gmail.com> wrote:
> > FYI: There is an existing issue -- SPARK-3314
> > <https://issues.apache.org/jira/browse/SPARK-3314> -- about scripting
> the
> > creation of Spark AMIs.
> >
> > With Packer, it looks like we may be able to script the creation of
> > multiple image types (VMWare, GCE, AMI, Docker, etc...) at once from a
> > single Packer template. That's very cool.
> >
> > I'll be looking into this.
> >
> > Nick
> >
> >
> > On Thu, Oct 2, 2014 at 8:23 PM, Nicholas Chammas <
> nicholas.chammas@gmail.com
> >> wrote:
> >
> >> Thanks for the update, Nate. I'm looking forward to seeing how these
> >> projects turn out.
> >>
> >> David, Packer looks very, very interesting. I'm gonna look into it more
> >> next week.
> >>
> >> Nick
> >>
> >>
> >> On Thu, Oct 2, 2014 at 8:00 PM, Nate D'Amico <na...@reactor8.com> wrote:
> >>
> >>> Bit of progress on our end, bit of lagging as well.  Our guy leading
> >>> effort got little bogged down on client project to update hive/sql
> testbed
> >>> to latest spark/sparkSQL, also launching public service so we have
> been bit
> >>> scattered recently.
> >>>
> >>> Will have some more updates probably after next week.  We are planning
> on
> >>> taking our client work around hive/spark, plus taking over the bigtop
> >>> automation work to modernize and get that fit for human consumption
> outside
> >>> or org.  All our work and puppet modules will be open sourced,
> documented,
> >>> hopefully start to rally some other folks around effort that find it
> useful
> >>>
> >>> Side note, another effort we are looking into is gradle tests/support.
> >>> We have been leveraging serverspec for some basic infrastructure
> tests, but
> >>> with bigtop switching over to gradle builds/testing setup in 0.8 we
> want to
> >>> include support for that in our own efforts, probably some stuff that
> can
> >>> be learned and leveraged in spark world for repeatable/tested
> infrastructure
> >>>
> >>> If anyone has any specific automation questions to your environment you
> >>> can drop me a line directly.., will try to help out best I can.  Else
> will
> >>> post update to dev list once we get on top of our own product release
> and
> >>> the bigtop work
> >>>
> >>> Nate
> >>>
> >>>
> >>> -----Original Message-----
> >>> From: David Rowe [mailto:davidrowe@gmail.com]
> >>> Sent: Thursday, October 02, 2014 4:44 PM
> >>> To: Nicholas Chammas
> >>> Cc: dev; Shivaram Venkataraman
> >>> Subject: Re: EC2 clusters ready in launch time + 30 seconds
> >>>
> >>> I think this is exactly what packer is for. See e.g.
> >>> http://www.packer.io/intro/getting-started/build-image.html
> >>>
> >>> On a related note, the current AMI for hvm systems (e.g. m3.*, r3.*)
> has
> >>> a bad package for httpd, whcih causes ganglia not to start. For some
> reason
> >>> I can't get access to the raw AMI to fix it.
> >>>
> >>> On Fri, Oct 3, 2014 at 9:30 AM, Nicholas Chammas <
> >>> nicholas.chammas@gmail.com
> >>> > wrote:
> >>>
> >>> > Is there perhaps a way to define an AMI programmatically? Like, a
> >>> > collection of base AMI id + list of required stuff to be installed +
> >>> > list of required configuration changes. I'm guessing that's what
> >>> > people use things like Puppet, Ansible, or maybe also AWS
> >>> CloudFormation for, right?
> >>> >
> >>> > If we could do something like that, then with every new release of
> >>> > Spark we could quickly and easily create new AMIs that have
> everything
> >>> we need.
> >>> > spark-ec2 would only have to bring up the instances and do a minimal
> >>> > amount of configuration, and the only thing we'd need to track in the
> >>> > Spark repo is the code that defines what goes on the AMI, as well as
> a
> >>> > list of the AMI ids specific to each release.
> >>> >
> >>> > I'm just thinking out loud here. Does this make sense?
> >>> >
> >>> > Nate,
> >>> >
> >>> > Any progress on your end with this work?
> >>> >
> >>> > Nick
> >>> >
> >>> >
> >>> > On Sun, Jul 13, 2014 at 8:53 PM, Shivaram Venkataraman <
> >>> > shivaram@eecs.berkeley.edu> wrote:
> >>> >
> >>> > > It should be possible to improve cluster launch time if we are
> >>> > > careful about what commands we run during setup. One way to do this
> >>> > > would be to walk down the list of things we do for cluster
> >>> > > initialization and see if there is anything we can do make things
> >>> > > faster. Unfortunately this might
> >>> > be
> >>> > > pretty time consuming, but I don't know of a better strategy. The
> >>> > > place
> >>> > to
> >>> > > start would be the setup.sh file at
> >>> > > https://github.com/mesos/spark-ec2/blob/v3/setup.sh
> >>> > >
> >>> > > Here are some things that take a lot of time and could be improved:
> >>> > > 1. Creating swap partitions on all machines. We could check if
> there
> >>> > > is a way to get EC2 to always mount a swap partition 2. Copying /
> >>> > > syncing things across slaves. The copy-dir script is called too
> many
> >>> > > times right now and each time it pauses for a few milliseconds
> >>> > > between slaves [1]. This could be improved by removing unnecessary
> >>> > > copies 3. We could make less frequently used modules like Tachyon,
> >>> > > persistent
> >>> > hdfs
> >>> > > not a part of the default setup.
> >>> > >
> >>> > > [1] https://github.com/mesos/spark-ec2/blob/v3/copy-dir.sh#L42
> >>> > >
> >>> > > Thanks
> >>> > > Shivaram
> >>> > >
> >>> > >
> >>> > >
> >>> > >
> >>> > > On Sat, Jul 12, 2014 at 7:02 PM, Nicholas Chammas <
> >>> > > nicholas.chammas@gmail.com> wrote:
> >>> > >
> >>> > > > On Thu, Jul 10, 2014 at 8:10 PM, Nate D'Amico <nate@reactor8.com
> >
> >>> > wrote:
> >>> > > >
> >>> > > > > Starting to work through some automation/config stuff for spark
> >>> > > > > stack
> >>> > > on
> >>> > > > > EC2 with a project, will be focusing the work through the
> apache
> >>> > bigtop
> >>> > > > > effort to start, can then share with spark community directly
> as
> >>> > things
> >>> > > > > progress if people are interested
> >>> > > >
> >>> > > >
> >>> > > > Let us know how that goes. I'm definitely interested in hearing
> >>> more.
> >>> > > >
> >>> > > > Nick
> >>> > > >
> >>> > >
> >>> >
> >>>
> >>>
> >>
>

Re: EC2 clusters ready in launch time + 30 seconds

Posted by Patrick Wendell <pw...@gmail.com>.

Hey All,

Just a couple notes. I recently posted a shell script for creating the
AMI's from a clean Amazon Linux AMI.

https://github.com/mesos/spark-ec2/blob/v3/create_image.sh

I think I will update the AMI's soon to get the most recent security
updates. For spark-ec2's purpose this is probably sufficient (we'll
only need to re-create them every few months).

However, it would be cool if someone wanted to tackle providing a more
general mechanism for defining Spark-friendly "images" that can be
used more generally. I had thought that docker might be a good way to
go for something like this - but maybe this packer thing is good too.

For one thing, if we had a standard image we could use it to create
containers for running Spark's unit test, which would be really cool.
This would help a lot with random issues around port and filesystem
contention we have for unit tests.

I'm not sure if the long term place for this would be inside the spark
codebase or a community library or what. But it would definitely be
very valuable to have if someone wanted to take it on.

- Patrick

On Fri, Oct 3, 2014 at 5:20 PM, Nicholas Chammas
<ni...@gmail.com> wrote:
> FYI: There is an existing issue -- SPARK-3314
> <https://issues.apache.org/jira/browse/SPARK-3314> -- about scripting the
> creation of Spark AMIs.
>
> With Packer, it looks like we may be able to script the creation of
> multiple image types (VMWare, GCE, AMI, Docker, etc...) at once from a
> single Packer template. That's very cool.
>
> I'll be looking into this.
>
> Nick
>
>
> On Thu, Oct 2, 2014 at 8:23 PM, Nicholas Chammas <nicholas.chammas@gmail.com
>> wrote:
>
>> Thanks for the update, Nate. I'm looking forward to seeing how these
>> projects turn out.
>>
>> David, Packer looks very, very interesting. I'm gonna look into it more
>> next week.
>>
>> Nick
>>
>>
>> On Thu, Oct 2, 2014 at 8:00 PM, Nate D'Amico <na...@reactor8.com> wrote:
>>
>>> Bit of progress on our end, bit of lagging as well.  Our guy leading
>>> effort got little bogged down on client project to update hive/sql testbed
>>> to latest spark/sparkSQL, also launching public service so we have been bit
>>> scattered recently.
>>>
>>> Will have some more updates probably after next week.  We are planning on
>>> taking our client work around hive/spark, plus taking over the bigtop
>>> automation work to modernize and get that fit for human consumption outside
>>> or org.  All our work and puppet modules will be open sourced, documented,
>>> hopefully start to rally some other folks around effort that find it useful
>>>
>>> Side note, another effort we are looking into is gradle tests/support.
>>> We have been leveraging serverspec for some basic infrastructure tests, but
>>> with bigtop switching over to gradle builds/testing setup in 0.8 we want to
>>> include support for that in our own efforts, probably some stuff that can
>>> be learned and leveraged in spark world for repeatable/tested infrastructure
>>>
>>> If anyone has any specific automation questions to your environment you
>>> can drop me a line directly.., will try to help out best I can.  Else will
>>> post update to dev list once we get on top of our own product release and
>>> the bigtop work
>>>
>>> Nate
>>>
>>>
>>> -----Original Message-----
>>> From: David Rowe [mailto:davidrowe@gmail.com]
>>> Sent: Thursday, October 02, 2014 4:44 PM
>>> To: Nicholas Chammas
>>> Cc: dev; Shivaram Venkataraman
>>> Subject: Re: EC2 clusters ready in launch time + 30 seconds
>>>
>>> I think this is exactly what packer is for. See e.g.
>>> http://www.packer.io/intro/getting-started/build-image.html
>>>
>>> On a related note, the current AMI for hvm systems (e.g. m3.*, r3.*) has
>>> a bad package for httpd, whcih causes ganglia not to start. For some reason
>>> I can't get access to the raw AMI to fix it.
>>>
>>> On Fri, Oct 3, 2014 at 9:30 AM, Nicholas Chammas <
>>> nicholas.chammas@gmail.com
>>> > wrote:
>>>
>>> > Is there perhaps a way to define an AMI programmatically? Like, a
>>> > collection of base AMI id + list of required stuff to be installed +
>>> > list of required configuration changes. I'm guessing that's what
>>> > people use things like Puppet, Ansible, or maybe also AWS
>>> CloudFormation for, right?
>>> >
>>> > If we could do something like that, then with every new release of
>>> > Spark we could quickly and easily create new AMIs that have everything
>>> we need.
>>> > spark-ec2 would only have to bring up the instances and do a minimal
>>> > amount of configuration, and the only thing we'd need to track in the
>>> > Spark repo is the code that defines what goes on the AMI, as well as a
>>> > list of the AMI ids specific to each release.
>>> >
>>> > I'm just thinking out loud here. Does this make sense?
>>> >
>>> > Nate,
>>> >
>>> > Any progress on your end with this work?
>>> >
>>> > Nick
>>> >
>>> >
>>> > On Sun, Jul 13, 2014 at 8:53 PM, Shivaram Venkataraman <
>>> > shivaram@eecs.berkeley.edu> wrote:
>>> >
>>> > > It should be possible to improve cluster launch time if we are
>>> > > careful about what commands we run during setup. One way to do this
>>> > > would be to walk down the list of things we do for cluster
>>> > > initialization and see if there is anything we can do make things
>>> > > faster. Unfortunately this might
>>> > be
>>> > > pretty time consuming, but I don't know of a better strategy. The
>>> > > place
>>> > to
>>> > > start would be the setup.sh file at
>>> > > https://github.com/mesos/spark-ec2/blob/v3/setup.sh
>>> > >
>>> > > Here are some things that take a lot of time and could be improved:
>>> > > 1. Creating swap partitions on all machines. We could check if there
>>> > > is a way to get EC2 to always mount a swap partition 2. Copying /
>>> > > syncing things across slaves. The copy-dir script is called too many
>>> > > times right now and each time it pauses for a few milliseconds
>>> > > between slaves [1]. This could be improved by removing unnecessary
>>> > > copies 3. We could make less frequently used modules like Tachyon,
>>> > > persistent
>>> > hdfs
>>> > > not a part of the default setup.
>>> > >
>>> > > [1] https://github.com/mesos/spark-ec2/blob/v3/copy-dir.sh#L42
>>> > >
>>> > > Thanks
>>> > > Shivaram
>>> > >
>>> > >
>>> > >
>>> > >
>>> > > On Sat, Jul 12, 2014 at 7:02 PM, Nicholas Chammas <
>>> > > nicholas.chammas@gmail.com> wrote:
>>> > >
>>> > > > On Thu, Jul 10, 2014 at 8:10 PM, Nate D'Amico <na...@reactor8.com>
>>> > wrote:
>>> > > >
>>> > > > > Starting to work through some automation/config stuff for spark
>>> > > > > stack
>>> > > on
>>> > > > > EC2 with a project, will be focusing the work through the apache
>>> > bigtop
>>> > > > > effort to start, can then share with spark community directly as
>>> > things
>>> > > > > progress if people are interested
>>> > > >
>>> > > >
>>> > > > Let us know how that goes. I'm definitely interested in hearing
>>> more.
>>> > > >
>>> > > > Nick
>>> > > >
>>> > >
>>> >
>>>
>>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: EC2 clusters ready in launch time + 30 seconds

Posted by Nicholas Chammas <ni...@gmail.com>.

FYI: There is an existing issue -- SPARK-3314
<https://issues.apache.org/jira/browse/SPARK-3314> -- about scripting the
creation of Spark AMIs.

With Packer, it looks like we may be able to script the creation of
multiple image types (VMWare, GCE, AMI, Docker, etc...) at once from a
single Packer template. That's very cool.

I'll be looking into this.

Nick


On Thu, Oct 2, 2014 at 8:23 PM, Nicholas Chammas <nicholas.chammas@gmail.com
> wrote:

> Thanks for the update, Nate. I'm looking forward to seeing how these
> projects turn out.
>
> David, Packer looks very, very interesting. I'm gonna look into it more
> next week.
>
> Nick
>
>
> On Thu, Oct 2, 2014 at 8:00 PM, Nate D'Amico <na...@reactor8.com> wrote:
>
>> Bit of progress on our end, bit of lagging as well.  Our guy leading
>> effort got little bogged down on client project to update hive/sql testbed
>> to latest spark/sparkSQL, also launching public service so we have been bit
>> scattered recently.
>>
>> Will have some more updates probably after next week.  We are planning on
>> taking our client work around hive/spark, plus taking over the bigtop
>> automation work to modernize and get that fit for human consumption outside
>> or org.  All our work and puppet modules will be open sourced, documented,
>> hopefully start to rally some other folks around effort that find it useful
>>
>> Side note, another effort we are looking into is gradle tests/support.
>> We have been leveraging serverspec for some basic infrastructure tests, but
>> with bigtop switching over to gradle builds/testing setup in 0.8 we want to
>> include support for that in our own efforts, probably some stuff that can
>> be learned and leveraged in spark world for repeatable/tested infrastructure
>>
>> If anyone has any specific automation questions to your environment you
>> can drop me a line directly.., will try to help out best I can.  Else will
>> post update to dev list once we get on top of our own product release and
>> the bigtop work
>>
>> Nate
>>
>>
>> -----Original Message-----
>> From: David Rowe [mailto:davidrowe@gmail.com]
>> Sent: Thursday, October 02, 2014 4:44 PM
>> To: Nicholas Chammas
>> Cc: dev; Shivaram Venkataraman
>> Subject: Re: EC2 clusters ready in launch time + 30 seconds
>>
>> I think this is exactly what packer is for. See e.g.
>> http://www.packer.io/intro/getting-started/build-image.html
>>
>> On a related note, the current AMI for hvm systems (e.g. m3.*, r3.*) has
>> a bad package for httpd, whcih causes ganglia not to start. For some reason
>> I can't get access to the raw AMI to fix it.
>>
>> On Fri, Oct 3, 2014 at 9:30 AM, Nicholas Chammas <
>> nicholas.chammas@gmail.com
>> > wrote:
>>
>> > Is there perhaps a way to define an AMI programmatically? Like, a
>> > collection of base AMI id + list of required stuff to be installed +
>> > list of required configuration changes. I’m guessing that’s what
>> > people use things like Puppet, Ansible, or maybe also AWS
>> CloudFormation for, right?
>> >
>> > If we could do something like that, then with every new release of
>> > Spark we could quickly and easily create new AMIs that have everything
>> we need.
>> > spark-ec2 would only have to bring up the instances and do a minimal
>> > amount of configuration, and the only thing we’d need to track in the
>> > Spark repo is the code that defines what goes on the AMI, as well as a
>> > list of the AMI ids specific to each release.
>> >
>> > I’m just thinking out loud here. Does this make sense?
>> >
>> > Nate,
>> >
>> > Any progress on your end with this work?
>> >
>> > Nick
>> > 
>> >
>> > On Sun, Jul 13, 2014 at 8:53 PM, Shivaram Venkataraman <
>> > shivaram@eecs.berkeley.edu> wrote:
>> >
>> > > It should be possible to improve cluster launch time if we are
>> > > careful about what commands we run during setup. One way to do this
>> > > would be to walk down the list of things we do for cluster
>> > > initialization and see if there is anything we can do make things
>> > > faster. Unfortunately this might
>> > be
>> > > pretty time consuming, but I don't know of a better strategy. The
>> > > place
>> > to
>> > > start would be the setup.sh file at
>> > > https://github.com/mesos/spark-ec2/blob/v3/setup.sh
>> > >
>> > > Here are some things that take a lot of time and could be improved:
>> > > 1. Creating swap partitions on all machines. We could check if there
>> > > is a way to get EC2 to always mount a swap partition 2. Copying /
>> > > syncing things across slaves. The copy-dir script is called too many
>> > > times right now and each time it pauses for a few milliseconds
>> > > between slaves [1]. This could be improved by removing unnecessary
>> > > copies 3. We could make less frequently used modules like Tachyon,
>> > > persistent
>> > hdfs
>> > > not a part of the default setup.
>> > >
>> > > [1] https://github.com/mesos/spark-ec2/blob/v3/copy-dir.sh#L42
>> > >
>> > > Thanks
>> > > Shivaram
>> > >
>> > >
>> > >
>> > >
>> > > On Sat, Jul 12, 2014 at 7:02 PM, Nicholas Chammas <
>> > > nicholas.chammas@gmail.com> wrote:
>> > >
>> > > > On Thu, Jul 10, 2014 at 8:10 PM, Nate D'Amico <na...@reactor8.com>
>> > wrote:
>> > > >
>> > > > > Starting to work through some automation/config stuff for spark
>> > > > > stack
>> > > on
>> > > > > EC2 with a project, will be focusing the work through the apache
>> > bigtop
>> > > > > effort to start, can then share with spark community directly as
>> > things
>> > > > > progress if people are interested
>> > > >
>> > > >
>> > > > Let us know how that goes. I'm definitely interested in hearing
>> more.
>> > > >
>> > > > Nick
>> > > >
>> > >
>> >
>>
>>
>

Re: EC2 clusters ready in launch time + 30 seconds

Posted by Nicholas Chammas <ni...@gmail.com>.

Thanks for the update, Nate. I'm looking forward to seeing how these
projects turn out.

David, Packer looks very, very interesting. I'm gonna look into it more
next week.

Nick


On Thu, Oct 2, 2014 at 8:00 PM, Nate D'Amico <na...@reactor8.com> wrote:

> Bit of progress on our end, bit of lagging as well.  Our guy leading
> effort got little bogged down on client project to update hive/sql testbed
> to latest spark/sparkSQL, also launching public service so we have been bit
> scattered recently.
>
> Will have some more updates probably after next week.  We are planning on
> taking our client work around hive/spark, plus taking over the bigtop
> automation work to modernize and get that fit for human consumption outside
> or org.  All our work and puppet modules will be open sourced, documented,
> hopefully start to rally some other folks around effort that find it useful
>
> Side note, another effort we are looking into is gradle tests/support.  We
> have been leveraging serverspec for some basic infrastructure tests, but
> with bigtop switching over to gradle builds/testing setup in 0.8 we want to
> include support for that in our own efforts, probably some stuff that can
> be learned and leveraged in spark world for repeatable/tested infrastructure
>
> If anyone has any specific automation questions to your environment you
> can drop me a line directly.., will try to help out best I can.  Else will
> post update to dev list once we get on top of our own product release and
> the bigtop work
>
> Nate
>
>
> -----Original Message-----
> From: David Rowe [mailto:davidrowe@gmail.com]
> Sent: Thursday, October 02, 2014 4:44 PM
> To: Nicholas Chammas
> Cc: dev; Shivaram Venkataraman
> Subject: Re: EC2 clusters ready in launch time + 30 seconds
>
> I think this is exactly what packer is for. See e.g.
> http://www.packer.io/intro/getting-started/build-image.html
>
> On a related note, the current AMI for hvm systems (e.g. m3.*, r3.*) has a
> bad package for httpd, whcih causes ganglia not to start. For some reason I
> can't get access to the raw AMI to fix it.
>
> On Fri, Oct 3, 2014 at 9:30 AM, Nicholas Chammas <
> nicholas.chammas@gmail.com
> > wrote:
>
> > Is there perhaps a way to define an AMI programmatically? Like, a
> > collection of base AMI id + list of required stuff to be installed +
> > list of required configuration changes. I’m guessing that’s what
> > people use things like Puppet, Ansible, or maybe also AWS CloudFormation
> for, right?
> >
> > If we could do something like that, then with every new release of
> > Spark we could quickly and easily create new AMIs that have everything
> we need.
> > spark-ec2 would only have to bring up the instances and do a minimal
> > amount of configuration, and the only thing we’d need to track in the
> > Spark repo is the code that defines what goes on the AMI, as well as a
> > list of the AMI ids specific to each release.
> >
> > I’m just thinking out loud here. Does this make sense?
> >
> > Nate,
> >
> > Any progress on your end with this work?
> >
> > Nick
> > 
> >
> > On Sun, Jul 13, 2014 at 8:53 PM, Shivaram Venkataraman <
> > shivaram@eecs.berkeley.edu> wrote:
> >
> > > It should be possible to improve cluster launch time if we are
> > > careful about what commands we run during setup. One way to do this
> > > would be to walk down the list of things we do for cluster
> > > initialization and see if there is anything we can do make things
> > > faster. Unfortunately this might
> > be
> > > pretty time consuming, but I don't know of a better strategy. The
> > > place
> > to
> > > start would be the setup.sh file at
> > > https://github.com/mesos/spark-ec2/blob/v3/setup.sh
> > >
> > > Here are some things that take a lot of time and could be improved:
> > > 1. Creating swap partitions on all machines. We could check if there
> > > is a way to get EC2 to always mount a swap partition 2. Copying /
> > > syncing things across slaves. The copy-dir script is called too many
> > > times right now and each time it pauses for a few milliseconds
> > > between slaves [1]. This could be improved by removing unnecessary
> > > copies 3. We could make less frequently used modules like Tachyon,
> > > persistent
> > hdfs
> > > not a part of the default setup.
> > >
> > > [1] https://github.com/mesos/spark-ec2/blob/v3/copy-dir.sh#L42
> > >
> > > Thanks
> > > Shivaram
> > >
> > >
> > >
> > >
> > > On Sat, Jul 12, 2014 at 7:02 PM, Nicholas Chammas <
> > > nicholas.chammas@gmail.com> wrote:
> > >
> > > > On Thu, Jul 10, 2014 at 8:10 PM, Nate D'Amico <na...@reactor8.com>
> > wrote:
> > > >
> > > > > Starting to work through some automation/config stuff for spark
> > > > > stack
> > > on
> > > > > EC2 with a project, will be focusing the work through the apache
> > bigtop
> > > > > effort to start, can then share with spark community directly as
> > things
> > > > > progress if people are interested
> > > >
> > > >
> > > > Let us know how that goes. I'm definitely interested in hearing more.
> > > >
> > > > Nick
> > > >
> > >
> >
>
>

RE: EC2 clusters ready in launch time + 30 seconds

Posted by Nate D'Amico <na...@reactor8.com>.

Bit of progress on our end, bit of lagging as well.  Our guy leading effort got little bogged down on client project to update hive/sql testbed to latest spark/sparkSQL, also launching public service so we have been bit scattered recently.

Will have some more updates probably after next week.  We are planning on taking our client work around hive/spark, plus taking over the bigtop automation work to modernize and get that fit for human consumption outside or org.  All our work and puppet modules will be open sourced, documented, hopefully start to rally some other folks around effort that find it useful

Side note, another effort we are looking into is gradle tests/support.  We have been leveraging serverspec for some basic infrastructure tests, but with bigtop switching over to gradle builds/testing setup in 0.8 we want to include support for that in our own efforts, probably some stuff that can be learned and leveraged in spark world for repeatable/tested infrastructure 

If anyone has any specific automation questions to your environment you can drop me a line directly.., will try to help out best I can.  Else will post update to dev list once we get on top of our own product release and the bigtop work

Nate

-----Original Message-----
From: David Rowe [mailto:davidrowe@gmail.com] 
Sent: Thursday, October 02, 2014 4:44 PM
To: Nicholas Chammas
Cc: dev; Shivaram Venkataraman
Subject: Re: EC2 clusters ready in launch time + 30 seconds

I think this is exactly what packer is for. See e.g.
http://www.packer.io/intro/getting-started/build-image.html

On a related note, the current AMI for hvm systems (e.g. m3.*, r3.*) has a bad package for httpd, whcih causes ganglia not to start. For some reason I can't get access to the raw AMI to fix it.

On Fri, Oct 3, 2014 at 9:30 AM, Nicholas Chammas <nicholas.chammas@gmail.com
> wrote:

> Is there perhaps a way to define an AMI programmatically? Like, a 
> collection of base AMI id + list of required stuff to be installed + 
> list of required configuration changes. I’m guessing that’s what 
> people use things like Puppet, Ansible, or maybe also AWS CloudFormation for, right?
>
> If we could do something like that, then with every new release of 
> Spark we could quickly and easily create new AMIs that have everything we need.
> spark-ec2 would only have to bring up the instances and do a minimal 
> amount of configuration, and the only thing we’d need to track in the 
> Spark repo is the code that defines what goes on the AMI, as well as a 
> list of the AMI ids specific to each release.
>
> I’m just thinking out loud here. Does this make sense?
>
> Nate,
>
> Any progress on your end with this work?
>
> Nick
> 
>
> On Sun, Jul 13, 2014 at 8:53 PM, Shivaram Venkataraman < 
> shivaram@eecs.berkeley.edu> wrote:
>
> > It should be possible to improve cluster launch time if we are 
> > careful about what commands we run during setup. One way to do this 
> > would be to walk down the list of things we do for cluster 
> > initialization and see if there is anything we can do make things 
> > faster. Unfortunately this might
> be
> > pretty time consuming, but I don't know of a better strategy. The 
> > place
> to
> > start would be the setup.sh file at
> > https://github.com/mesos/spark-ec2/blob/v3/setup.sh
> >
> > Here are some things that take a lot of time and could be improved:
> > 1. Creating swap partitions on all machines. We could check if there 
> > is a way to get EC2 to always mount a swap partition 2. Copying / 
> > syncing things across slaves. The copy-dir script is called too many 
> > times right now and each time it pauses for a few milliseconds 
> > between slaves [1]. This could be improved by removing unnecessary 
> > copies 3. We could make less frequently used modules like Tachyon, 
> > persistent
> hdfs
> > not a part of the default setup.
> >
> > [1] https://github.com/mesos/spark-ec2/blob/v3/copy-dir.sh#L42
> >
> > Thanks
> > Shivaram
> >
> >
> >
> >
> > On Sat, Jul 12, 2014 at 7:02 PM, Nicholas Chammas < 
> > nicholas.chammas@gmail.com> wrote:
> >
> > > On Thu, Jul 10, 2014 at 8:10 PM, Nate D'Amico <na...@reactor8.com>
> wrote:
> > >
> > > > Starting to work through some automation/config stuff for spark 
> > > > stack
> > on
> > > > EC2 with a project, will be focusing the work through the apache
> bigtop
> > > > effort to start, can then share with spark community directly as
> things
> > > > progress if people are interested
> > >
> > >
> > > Let us know how that goes. I'm definitely interested in hearing more.
> > >
> > > Nick
> > >
> >
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: EC2 clusters ready in launch time + 30 seconds

Posted by David Rowe <da...@gmail.com>.

I think this is exactly what packer is for. See e.g.
http://www.packer.io/intro/getting-started/build-image.html

On a related note, the current AMI for hvm systems (e.g. m3.*, r3.*) has a
bad package for httpd, whcih causes ganglia not to start. For some reason I
can't get access to the raw AMI to fix it.

On Fri, Oct 3, 2014 at 9:30 AM, Nicholas Chammas <nicholas.chammas@gmail.com
> wrote:

> Is there perhaps a way to define an AMI programmatically? Like, a
> collection of base AMI id + list of required stuff to be installed + list
> of required configuration changes. I’m guessing that’s what people use
> things like Puppet, Ansible, or maybe also AWS CloudFormation for, right?
>
> If we could do something like that, then with every new release of Spark we
> could quickly and easily create new AMIs that have everything we need.
> spark-ec2 would only have to bring up the instances and do a minimal amount
> of configuration, and the only thing we’d need to track in the Spark repo
> is the code that defines what goes on the AMI, as well as a list of the AMI
> ids specific to each release.
>
> I’m just thinking out loud here. Does this make sense?
>
> Nate,
>
> Any progress on your end with this work?
>
> Nick
> 
>
> On Sun, Jul 13, 2014 at 8:53 PM, Shivaram Venkataraman <
> shivaram@eecs.berkeley.edu> wrote:
>
> > It should be possible to improve cluster launch time if we are careful
> > about what commands we run during setup. One way to do this would be to
> > walk down the list of things we do for cluster initialization and see if
> > there is anything we can do make things faster. Unfortunately this might
> be
> > pretty time consuming, but I don't know of a better strategy. The place
> to
> > start would be the setup.sh file at
> > https://github.com/mesos/spark-ec2/blob/v3/setup.sh
> >
> > Here are some things that take a lot of time and could be improved:
> > 1. Creating swap partitions on all machines. We could check if there is a
> > way to get EC2 to always mount a swap partition
> > 2. Copying / syncing things across slaves. The copy-dir script is called
> > too many times right now and each time it pauses for a few milliseconds
> > between slaves [1]. This could be improved by removing unnecessary copies
> > 3. We could make less frequently used modules like Tachyon, persistent
> hdfs
> > not a part of the default setup.
> >
> > [1] https://github.com/mesos/spark-ec2/blob/v3/copy-dir.sh#L42
> >
> > Thanks
> > Shivaram
> >
> >
> >
> >
> > On Sat, Jul 12, 2014 at 7:02 PM, Nicholas Chammas <
> > nicholas.chammas@gmail.com> wrote:
> >
> > > On Thu, Jul 10, 2014 at 8:10 PM, Nate D'Amico <na...@reactor8.com>
> wrote:
> > >
> > > > Starting to work through some automation/config stuff for spark stack
> > on
> > > > EC2 with a project, will be focusing the work through the apache
> bigtop
> > > > effort to start, can then share with spark community directly as
> things
> > > > progress if people are interested
> > >
> > >
> > > Let us know how that goes. I'm definitely interested in hearing more.
> > >
> > > Nick
> > >
> >
>