You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Nicholas Chammas <ni...@gmail.com> on 2015/04/21 21:05:23 UTC

Is spark-ec2 for production use?

Is spark-ec2 intended for spinning up production Spark clusters?

I think the answer is no.

However, the docs for spark-ec2
<https://spark.apache.org/docs/latest/ec2-scripts.html> very much leave
that possibility open, and indeed I see many people asking questions or
opening issues that stem from some production use case they are trying to
fit spark-ec2 to.

Here's the latest example
<https://issues.apache.org/jira/browse/SPARK-6900?focusedCommentId=14504236&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14504236>
of
someone using spark-ec2 to power their (presumably) production service.

Shouldn't we actively discourage people from using spark-ec2 in this way?

I understand there's no stopping people from doing what they want with it,
and certainly the questions and issues we receive about spark-ec2 are still
valid, even if they stem from discouraged use cases.

>From what I understand, spark-ec2 is intended for quick experimentation,
one-off jobs, prototypes, and so forth.

If that's the case, it's best to stress this in the docs.

Nick

Re: Is spark-ec2 for production use?

Posted by Shivaram Venkataraman <sh...@eecs.berkeley.edu>.
I'm not sure its exactly easy to define 'production' use. One thing we
could stress is that spark-ec2 is meant to be run manually (i.e. it outputs
errors, asks for prompts etc.) and that automating it is not in our scope
right now.

Shivaram

On Tue, Apr 21, 2015 at 12:05 PM, Nicholas Chammas <
nicholas.chammas@gmail.com> wrote:

> Is spark-ec2 intended for spinning up production Spark clusters?
>
> I think the answer is no.
>
> However, the docs for spark-ec2
> <https://spark.apache.org/docs/latest/ec2-scripts.html> very much leave
> that possibility open, and indeed I see many people asking questions or
> opening issues that stem from some production use case they are trying to
> fit spark-ec2 to.
>
> Here's the latest example
> <
> https://issues.apache.org/jira/browse/SPARK-6900?focusedCommentId=14504236&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14504236
> >
> of
> someone using spark-ec2 to power their (presumably) production service.
>
> Shouldn't we actively discourage people from using spark-ec2 in this way?
>
> I understand there's no stopping people from doing what they want with it,
> and certainly the questions and issues we receive about spark-ec2 are still
> valid, even if they stem from discouraged use cases.
>
> From what I understand, spark-ec2 is intended for quick experimentation,
> one-off jobs, prototypes, and so forth.
>
> If that's the case, it's best to stress this in the docs.
>
> Nick
>

RE: Is spark-ec2 for production use?

Posted by na...@reactor8.com.
"Replacement for production-ish" is beyond a stretch phrasing, UX just isn’t there yet for average end user wanting push-button.

Up until a bit ago focus was heavily focused on infrastructure folks and people building their own distros.  Project is turning towards "end users" so anyone from ops to dev/data-hacker will be able to extract value and get moving easily.

If you are brave enough to give it a go and start playing around with it in its current state you can start here looking at puppet modules readme:

https://github.com/apache/bigtop/tree/master/bigtop-deploy/puppet

Currently limited (ie: no yarn, mesos variants, orchestration not added yet), things will be stepping up a great detail heading out of 1.0 release.  If you do and run into stuff hop on mailing list, docs are another area updating is needed.

Thanks for pointers on the json feed link, definitely handy for some smoke tests


-----Original Message-----
From: Nicholas Chammas [mailto:nicholas.chammas@gmail.com] 
Sent: Tuesday, April 21, 2015 2:33 PM
To: nate@reactor8.com; Spark dev list
Subject: Re: Is spark-ec2 for production use?

Nate, could you point us to an example of how one would use Big Top as a "more production-ish" replacement for spark-ec2? I look a look at the project page <http://bigtop.apache.org/index.html>, but couldn't find any usage examples. Perhaps we can link to them from the spark-ec2 docs.

Regarding tests to validate that Spark was set up correctly, I am using the JSON feed from the Spark master web UI <http://stackoverflow.com/a/29659630/877069> for starters. Y'all might find it useful for the same purpose.

Nick

On Tue, Apr 21, 2015 at 5:21 PM <na...@reactor8.com> wrote:

> Several of the Bigtop folks got together last week at ApacheCon, this 
> was popular topic for next enhancements with spark related components 
> after getting 1.0 out the door.  Some leading topics were:
>
> -deployment of spark specific clusters
>      -spark standalone, hdfs
>      -spark over yarn, hdfs
>      -spark on mesos (talked to mesos folk about working to include in 
> bigtop post 1.0)
>      -the above plus variants of other bigtop components (ie: kafka, 
> zeppelin, demo data generators)
>
> One thing group would like some help on is tests for spark 
> environments so things can be validated post build/deploy and enhance 
> CI process so if you choose to deploy via bigtop in test/prod/etc you 
> know things have gone through a certain amount of rigor beforehand
>
> Nate
>
> -----Original Message-----
> From: Patrick Wendell [mailto:pwendell@gmail.com]
> Sent: Tuesday, April 21, 2015 12:46 PM
> To: Nicholas Chammas
> Cc: Spark dev list
> Subject: Re: Is spark-ec2 for production use?
>
> It could be a good idea to document this a bit. The original goals 
> were to give people an easy way to get started with Spark and also to 
> provide a consistent environment for our own experiments and 
> benchmarking of Spark at the AMPLab. Over time I've noticed a huge 
> amount of scope increase in terms of what people want to do and I do 
> know that many companies run production infrastructure based on launching the EC2 scripts.
>
> My feeling is that the general problem of deploying Spark with other 
> applications and frameworks is fairly well covered by projects which 
> specifically focus on packaging and automation (e.g. Whirr, BigTop, etc).
> So
> I'd like to see a narrower focus on just getting a vanilla Spark 
> cluster up and running and make it clear that customization and 
> extension of that functionality is really not in scope.
>
> This doesn't mean discouraging people from using it for production use 
> cases, but more that they shouldn't expect us to merge and maintain 
> things that seek to do broader integration with other technologies, 
> automation, etc.
>
> - Patrick
>
> On Tue, Apr 21, 2015 at 12:05 PM, Nicholas Chammas 
> <ni...@gmail.com> wrote:
> > Is spark-ec2 intended for spinning up production Spark clusters?
> >
> > I think the answer is no.
> >
> > However, the docs for spark-ec2
> > <https://spark.apache.org/docs/latest/ec2-scripts.html> very much 
> > leave that possibility open, and indeed I see many people asking 
> > questions or opening issues that stem from some production use case 
> > they are trying to fit spark-ec2 to.
> >
> > Here's the latest example
> > <https://issues.apache.org/jira/browse/SPARK-6900?focusedCommentId=1
> > 45 
> > 04236&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-t
> > ab
> > panel#comment-14504236>
> > of
> > someone using spark-ec2 to power their (presumably) production service.
> >
> > Shouldn't we actively discourage people from using spark-ec2 in this way?
> >
> > I understand there's no stopping people from doing what they want 
> > with it, and certainly the questions and issues we receive about 
> > spark-ec2 are still valid, even if they stem from discouraged use cases.
> >
> > From what I understand, spark-ec2 is intended for quick 
> > experimentation, one-off jobs, prototypes, and so forth.
> >
> > If that's the case, it's best to stress this in the docs.
> >
> > Nick
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org For 
> additional commands, e-mail: dev-help@spark.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org For 
> additional commands, e-mail: dev-help@spark.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: Is spark-ec2 for production use?

Posted by Nicholas Chammas <ni...@gmail.com>.
Nate, could you point us to an example of how one would use Big Top as a
"more production-ish" replacement for spark-ec2? I look a look at the project
page <http://bigtop.apache.org/index.html>, but couldn't find any usage
examples. Perhaps we can link to them from the spark-ec2 docs.

Regarding tests to validate that Spark was set up correctly, I am
using the JSON
feed from the Spark master web UI
<http://stackoverflow.com/a/29659630/877069> for starters. Y'all might find
it useful for the same purpose.

Nick

On Tue, Apr 21, 2015 at 5:21 PM <na...@reactor8.com> wrote:

> Several of the Bigtop folks got together last week at ApacheCon, this was
> popular topic for next enhancements with spark related components after
> getting 1.0 out the door.  Some leading topics were:
>
> -deployment of spark specific clusters
>      -spark standalone, hdfs
>      -spark over yarn, hdfs
>      -spark on mesos (talked to mesos folk about working to include in
> bigtop post 1.0)
>      -the above plus variants of other bigtop components (ie: kafka,
> zeppelin, demo data generators)
>
> One thing group would like some help on is tests for spark environments so
> things can be validated post build/deploy and enhance CI process so if you
> choose to deploy via bigtop in test/prod/etc you know things have gone
> through a certain amount of rigor beforehand
>
> Nate
>
> -----Original Message-----
> From: Patrick Wendell [mailto:pwendell@gmail.com]
> Sent: Tuesday, April 21, 2015 12:46 PM
> To: Nicholas Chammas
> Cc: Spark dev list
> Subject: Re: Is spark-ec2 for production use?
>
> It could be a good idea to document this a bit. The original goals were to
> give people an easy way to get started with Spark and also to provide a
> consistent environment for our own experiments and benchmarking of Spark at
> the AMPLab. Over time I've noticed a huge amount of scope increase in terms
> of what people want to do and I do know that many companies run production
> infrastructure based on launching the EC2 scripts.
>
> My feeling is that the general problem of deploying Spark with other
> applications and frameworks is fairly well covered by projects which
> specifically focus on packaging and automation (e.g. Whirr, BigTop, etc).
> So
> I'd like to see a narrower focus on just getting a vanilla Spark cluster up
> and running and make it clear that customization and extension of that
> functionality is really not in scope.
>
> This doesn't mean discouraging people from using it for production use
> cases, but more that they shouldn't expect us to merge and maintain things
> that seek to do broader integration with other technologies, automation,
> etc.
>
> - Patrick
>
> On Tue, Apr 21, 2015 at 12:05 PM, Nicholas Chammas
> <ni...@gmail.com> wrote:
> > Is spark-ec2 intended for spinning up production Spark clusters?
> >
> > I think the answer is no.
> >
> > However, the docs for spark-ec2
> > <https://spark.apache.org/docs/latest/ec2-scripts.html> very much
> > leave that possibility open, and indeed I see many people asking
> > questions or opening issues that stem from some production use case
> > they are trying to fit spark-ec2 to.
> >
> > Here's the latest example
> > <https://issues.apache.org/jira/browse/SPARK-6900?focusedCommentId=145
> > 04236&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tab
> > panel#comment-14504236>
> > of
> > someone using spark-ec2 to power their (presumably) production service.
> >
> > Shouldn't we actively discourage people from using spark-ec2 in this way?
> >
> > I understand there's no stopping people from doing what they want with
> > it, and certainly the questions and issues we receive about spark-ec2
> > are still valid, even if they stem from discouraged use cases.
> >
> > From what I understand, spark-ec2 is intended for quick
> > experimentation, one-off jobs, prototypes, and so forth.
> >
> > If that's the case, it's best to stress this in the docs.
> >
> > Nick
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org For additional
> commands, e-mail: dev-help@spark.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>

RE: Is spark-ec2 for production use?

Posted by na...@reactor8.com.
Several of the Bigtop folks got together last week at ApacheCon, this was
popular topic for next enhancements with spark related components after
getting 1.0 out the door.  Some leading topics were:

-deployment of spark specific clusters
     -spark standalone, hdfs
     -spark over yarn, hdfs
     -spark on mesos (talked to mesos folk about working to include in
bigtop post 1.0)
     -the above plus variants of other bigtop components (ie: kafka,
zeppelin, demo data generators)

One thing group would like some help on is tests for spark environments so
things can be validated post build/deploy and enhance CI process so if you
choose to deploy via bigtop in test/prod/etc you know things have gone
through a certain amount of rigor beforehand

Nate

-----Original Message-----
From: Patrick Wendell [mailto:pwendell@gmail.com] 
Sent: Tuesday, April 21, 2015 12:46 PM
To: Nicholas Chammas
Cc: Spark dev list
Subject: Re: Is spark-ec2 for production use?

It could be a good idea to document this a bit. The original goals were to
give people an easy way to get started with Spark and also to provide a
consistent environment for our own experiments and benchmarking of Spark at
the AMPLab. Over time I've noticed a huge amount of scope increase in terms
of what people want to do and I do know that many companies run production
infrastructure based on launching the EC2 scripts.

My feeling is that the general problem of deploying Spark with other
applications and frameworks is fairly well covered by projects which
specifically focus on packaging and automation (e.g. Whirr, BigTop, etc). So
I'd like to see a narrower focus on just getting a vanilla Spark cluster up
and running and make it clear that customization and extension of that
functionality is really not in scope.

This doesn't mean discouraging people from using it for production use
cases, but more that they shouldn't expect us to merge and maintain things
that seek to do broader integration with other technologies, automation,
etc.

- Patrick

On Tue, Apr 21, 2015 at 12:05 PM, Nicholas Chammas
<ni...@gmail.com> wrote:
> Is spark-ec2 intended for spinning up production Spark clusters?
>
> I think the answer is no.
>
> However, the docs for spark-ec2
> <https://spark.apache.org/docs/latest/ec2-scripts.html> very much 
> leave that possibility open, and indeed I see many people asking 
> questions or opening issues that stem from some production use case 
> they are trying to fit spark-ec2 to.
>
> Here's the latest example
> <https://issues.apache.org/jira/browse/SPARK-6900?focusedCommentId=145
> 04236&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tab
> panel#comment-14504236>
> of
> someone using spark-ec2 to power their (presumably) production service.
>
> Shouldn't we actively discourage people from using spark-ec2 in this way?
>
> I understand there's no stopping people from doing what they want with 
> it, and certainly the questions and issues we receive about spark-ec2 
> are still valid, even if they stem from discouraged use cases.
>
> From what I understand, spark-ec2 is intended for quick 
> experimentation, one-off jobs, prototypes, and so forth.
>
> If that's the case, it's best to stress this in the docs.
>
> Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org For additional
commands, e-mail: dev-help@spark.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: Is spark-ec2 for production use?

Posted by Patrick Wendell <pw...@gmail.com>.
It could be a good idea to document this a bit. The original goals
were to give people an easy way to get started with Spark and also to
provide a consistent environment for our own experiments and
benchmarking of Spark at the AMPLab. Over time I've noticed a huge
amount of scope increase in terms of what people want to do and I do
know that many companies run production infrastructure based on
launching the EC2 scripts.

My feeling is that the general problem of deploying Spark with other
applications and frameworks is fairly well covered by projects which
specifically focus on packaging and automation (e.g. Whirr, BigTop,
etc). So I'd like to see a narrower focus on just getting a vanilla
Spark cluster up and running and make it clear that customization and
extension of that functionality is really not in scope.

This doesn't mean discouraging people from using it for production use
cases, but more that they shouldn't expect us to merge and maintain
things that seek to do broader integration with other technologies,
automation, etc.

- Patrick

On Tue, Apr 21, 2015 at 12:05 PM, Nicholas Chammas
<ni...@gmail.com> wrote:
> Is spark-ec2 intended for spinning up production Spark clusters?
>
> I think the answer is no.
>
> However, the docs for spark-ec2
> <https://spark.apache.org/docs/latest/ec2-scripts.html> very much leave
> that possibility open, and indeed I see many people asking questions or
> opening issues that stem from some production use case they are trying to
> fit spark-ec2 to.
>
> Here's the latest example
> <https://issues.apache.org/jira/browse/SPARK-6900?focusedCommentId=14504236&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14504236>
> of
> someone using spark-ec2 to power their (presumably) production service.
>
> Shouldn't we actively discourage people from using spark-ec2 in this way?
>
> I understand there's no stopping people from doing what they want with it,
> and certainly the questions and issues we receive about spark-ec2 are still
> valid, even if they stem from discouraged use cases.
>
> From what I understand, spark-ec2 is intended for quick experimentation,
> one-off jobs, prototypes, and so forth.
>
> If that's the case, it's best to stress this in the docs.
>
> Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org