You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Nicholas Chammas <ni...@gmail.com> on 2014/11/08 09:38:16 UTC

Re: EC2 clusters ready in launch time + 30 seconds

I've posted
<https://issues.apache.org/jira/browse/SPARK-3821?focusedCommentId=14203280&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14203280>
an initial proposal and implementation of using Packer to automate
generating Spark AMIs to SPARK-3821
<https://issues.apache.org/jira/browse/SPARK-3821>.

On Mon, Oct 6, 2014 at 7:40 PM, David Rowe <da...@gmail.com> wrote:

> I agree with this - there is also the issue of different sized masters and
> slaves, and numbers of executors for hefty machines (e.g. r3.8xlarges),
> tagging of instances and volumes (we use this for cost attribution at my
> workplace), and running in VPCs.
>
> I think think it might be useful to take a layered approach: the first
> step could be getting a good reliable image produced - Nick's ticket - then
> doing some work on the launch script.
>
> Regarding the EMR like service - I think I heard that AWS is planning to
> add spark support to EMR, but as usual there's nothing firm until it's
> released.
>
>
> On Tue, Oct 7, 2014 at 7:48 AM, Daniil Osipov <da...@shazam.com>
> wrote:
>
>> I've also been looking at this. Basically, the Spark EC2 script is
>> excellent for small development clusters of several nodes, but isn't
>> suitable for production. It handles instance setup in a single threaded
>> manner, while it can easily be parallelized. It also doesn't handle
>> failure
>> well, ex when an instance fails to start or is taking too long to respond.
>>
>> Our desire was to have an equivalent to Amazon EMR[1] API that would
>> trigger Spark jobs, including specified cluster setup. I've done some work
>> towards that end, and it would benefit from an updated AMI greatly.
>>
>> Dan
>>
>> [1]
>>
>> http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-cli-commands.html
>>
>> On Sat, Oct 4, 2014 at 7:28 AM, Nicholas Chammas <
>> nicholas.chammas@gmail.com
>> > wrote:
>>
>> > Thanks for posting that script, Patrick. It looks like a good place to
>> > start.
>> >
>> > Regarding Docker vs. Packer, as I understand it you can use Packer to
>> > create Docker containers at the same time as AMIs and other image types.
>> >
>> > Nick
>> >
>> >
>> > On Sat, Oct 4, 2014 at 2:49 AM, Patrick Wendell <pw...@gmail.com>
>> > wrote:
>> >
>> > > Hey All,
>> > >
>> > > Just a couple notes. I recently posted a shell script for creating the
>> > > AMI's from a clean Amazon Linux AMI.
>> > >
>> > > https://github.com/mesos/spark-ec2/blob/v3/create_image.sh
>> > >
>> > > I think I will update the AMI's soon to get the most recent security
>> > > updates. For spark-ec2's purpose this is probably sufficient (we'll
>> > > only need to re-create them every few months).
>> > >
>> > > However, it would be cool if someone wanted to tackle providing a more
>> > > general mechanism for defining Spark-friendly "images" that can be
>> > > used more generally. I had thought that docker might be a good way to
>> > > go for something like this - but maybe this packer thing is good too.
>> > >
>> > > For one thing, if we had a standard image we could use it to create
>> > > containers for running Spark's unit test, which would be really cool.
>> > > This would help a lot with random issues around port and filesystem
>> > > contention we have for unit tests.
>> > >
>> > > I'm not sure if the long term place for this would be inside the spark
>> > > codebase or a community library or what. But it would definitely be
>> > > very valuable to have if someone wanted to take it on.
>> > >
>> > > - Patrick
>> > >
>> > > On Fri, Oct 3, 2014 at 5:20 PM, Nicholas Chammas
>> > > <ni...@gmail.com> wrote:
>> > > > FYI: There is an existing issue -- SPARK-3314
>> > > > <https://issues.apache.org/jira/browse/SPARK-3314> -- about
>> scripting
>> > > the
>> > > > creation of Spark AMIs.
>> > > >
>> > > > With Packer, it looks like we may be able to script the creation of
>> > > > multiple image types (VMWare, GCE, AMI, Docker, etc...) at once
>> from a
>> > > > single Packer template. That's very cool.
>> > > >
>> > > > I'll be looking into this.
>> > > >
>> > > > Nick
>> > > >
>> > > >
>> > > > On Thu, Oct 2, 2014 at 8:23 PM, Nicholas Chammas <
>> > > nicholas.chammas@gmail.com
>> > > >> wrote:
>> > > >
>> > > >> Thanks for the update, Nate. I'm looking forward to seeing how
>> these
>> > > >> projects turn out.
>> > > >>
>> > > >> David, Packer looks very, very interesting. I'm gonna look into it
>> > more
>> > > >> next week.
>> > > >>
>> > > >> Nick
>> > > >>
>> > > >>
>> > > >> On Thu, Oct 2, 2014 at 8:00 PM, Nate D'Amico <na...@reactor8.com>
>> > wrote:
>> > > >>
>> > > >>> Bit of progress on our end, bit of lagging as well.  Our guy
>> leading
>> > > >>> effort got little bogged down on client project to update hive/sql
>> > > testbed
>> > > >>> to latest spark/sparkSQL, also launching public service so we have
>> > > been bit
>> > > >>> scattered recently.
>> > > >>>
>> > > >>> Will have some more updates probably after next week.  We are
>> > planning
>> > > on
>> > > >>> taking our client work around hive/spark, plus taking over the
>> bigtop
>> > > >>> automation work to modernize and get that fit for human
>> consumption
>> > > outside
>> > > >>> or org.  All our work and puppet modules will be open sourced,
>> > > documented,
>> > > >>> hopefully start to rally some other folks around effort that find
>> it
>> > > useful
>> > > >>>
>> > > >>> Side note, another effort we are looking into is gradle
>> > tests/support.
>> > > >>> We have been leveraging serverspec for some basic infrastructure
>> > > tests, but
>> > > >>> with bigtop switching over to gradle builds/testing setup in 0.8
>> we
>> > > want to
>> > > >>> include support for that in our own efforts, probably some stuff
>> that
>> > > can
>> > > >>> be learned and leveraged in spark world for repeatable/tested
>> > > infrastructure
>> > > >>>
>> > > >>> If anyone has any specific automation questions to your
>> environment
>> > you
>> > > >>> can drop me a line directly.., will try to help out best I can.
>> Else
>> > > will
>> > > >>> post update to dev list once we get on top of our own product
>> release
>> > > and
>> > > >>> the bigtop work
>> > > >>>
>> > > >>> Nate
>> > > >>>
>> > > >>>
>> > > >>> -----Original Message-----
>> > > >>> From: David Rowe [mailto:davidrowe@gmail.com]
>> > > >>> Sent: Thursday, October 02, 2014 4:44 PM
>> > > >>> To: Nicholas Chammas
>> > > >>> Cc: dev; Shivaram Venkataraman
>> > > >>> Subject: Re: EC2 clusters ready in launch time + 30 seconds
>> > > >>>
>> > > >>> I think this is exactly what packer is for. See e.g.
>> > > >>> http://www.packer.io/intro/getting-started/build-image.html
>> > > >>>
>> > > >>> On a related note, the current AMI for hvm systems (e.g. m3.*,
>> r3.*)
>> > > has
>> > > >>> a bad package for httpd, whcih causes ganglia not to start. For
>> some
>> > > reason
>> > > >>> I can't get access to the raw AMI to fix it.
>> > > >>>
>> > > >>> On Fri, Oct 3, 2014 at 9:30 AM, Nicholas Chammas <
>> > > >>> nicholas.chammas@gmail.com
>> > > >>> > wrote:
>> > > >>>
>> > > >>> > Is there perhaps a way to define an AMI programmatically? Like,
>> a
>> > > >>> > collection of base AMI id + list of required stuff to be
>> installed
>> > +
>> > > >>> > list of required configuration changes. I'm guessing that's what
>> > > >>> > people use things like Puppet, Ansible, or maybe also AWS
>> > > >>> CloudFormation for, right?
>> > > >>> >
>> > > >>> > If we could do something like that, then with every new release
>> of
>> > > >>> > Spark we could quickly and easily create new AMIs that have
>> > > everything
>> > > >>> we need.
>> > > >>> > spark-ec2 would only have to bring up the instances and do a
>> > minimal
>> > > >>> > amount of configuration, and the only thing we'd need to track
>> in
>> > the
>> > > >>> > Spark repo is the code that defines what goes on the AMI, as
>> well
>> > as
>> > > a
>> > > >>> > list of the AMI ids specific to each release.
>> > > >>> >
>> > > >>> > I'm just thinking out loud here. Does this make sense?
>> > > >>> >
>> > > >>> > Nate,
>> > > >>> >
>> > > >>> > Any progress on your end with this work?
>> > > >>> >
>> > > >>> > Nick
>> > > >>> >
>> > > >>> >
>> > > >>> > On Sun, Jul 13, 2014 at 8:53 PM, Shivaram Venkataraman <
>> > > >>> > shivaram@eecs.berkeley.edu> wrote:
>> > > >>> >
>> > > >>> > > It should be possible to improve cluster launch time if we are
>> > > >>> > > careful about what commands we run during setup. One way to do
>> > this
>> > > >>> > > would be to walk down the list of things we do for cluster
>> > > >>> > > initialization and see if there is anything we can do make
>> things
>> > > >>> > > faster. Unfortunately this might
>> > > >>> > be
>> > > >>> > > pretty time consuming, but I don't know of a better strategy.
>> The
>> > > >>> > > place
>> > > >>> > to
>> > > >>> > > start would be the setup.sh file at
>> > > >>> > > https://github.com/mesos/spark-ec2/blob/v3/setup.sh
>> > > >>> > >
>> > > >>> > > Here are some things that take a lot of time and could be
>> > improved:
>> > > >>> > > 1. Creating swap partitions on all machines. We could check if
>> > > there
>> > > >>> > > is a way to get EC2 to always mount a swap partition 2.
>> Copying /
>> > > >>> > > syncing things across slaves. The copy-dir script is called
>> too
>> > > many
>> > > >>> > > times right now and each time it pauses for a few milliseconds
>> > > >>> > > between slaves [1]. This could be improved by removing
>> > unnecessary
>> > > >>> > > copies 3. We could make less frequently used modules like
>> > Tachyon,
>> > > >>> > > persistent
>> > > >>> > hdfs
>> > > >>> > > not a part of the default setup.
>> > > >>> > >
>> > > >>> > > [1]
>> https://github.com/mesos/spark-ec2/blob/v3/copy-dir.sh#L42
>> > > >>> > >
>> > > >>> > > Thanks
>> > > >>> > > Shivaram
>> > > >>> > >
>> > > >>> > >
>> > > >>> > >
>> > > >>> > >
>> > > >>> > > On Sat, Jul 12, 2014 at 7:02 PM, Nicholas Chammas <
>> > > >>> > > nicholas.chammas@gmail.com> wrote:
>> > > >>> > >
>> > > >>> > > > On Thu, Jul 10, 2014 at 8:10 PM, Nate D'Amico <
>> > nate@reactor8.com
>> > > >
>> > > >>> > wrote:
>> > > >>> > > >
>> > > >>> > > > > Starting to work through some automation/config stuff for
>> > spark
>> > > >>> > > > > stack
>> > > >>> > > on
>> > > >>> > > > > EC2 with a project, will be focusing the work through the
>> > > apache
>> > > >>> > bigtop
>> > > >>> > > > > effort to start, can then share with spark community
>> directly
>> > > as
>> > > >>> > things
>> > > >>> > > > > progress if people are interested
>> > > >>> > > >
>> > > >>> > > >
>> > > >>> > > > Let us know how that goes. I'm definitely interested in
>> hearing
>> > > >>> more.
>> > > >>> > > >
>> > > >>> > > > Nick
>> > > >>> > > >
>> > > >>> > >
>> > > >>> >
>> > > >>>
>> > > >>>
>> > > >>
>> > >
>> >
>>
>
>