You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by in4maniac <sa...@skimlinks.com> on 2017/02/23 12:23:12 UTC

New Amazon AMIs for EC2 script

Hyy all, 

I have been using the EC2 script to launch R&D pyspark clusters for a while
now. As we use alot of packages such as numpy and scipy with openblas,
scikit-learn, bokeh, vowpal wabbit, pystan and etc... All this time, we have
been building AMIs on top of the standard spark-AMIs at
https://github.com/amplab/spark-ec2/tree/branch-1.6/ami-list/us-east-1 

Mainly, I have done the following:
- updated yum
- Changed the standard python to python 2.7
- changed pip to 2.7 and installed alot of libararies on top of the existing
AMIs and created my own AMIs to avoid having to boostrap. 

But the ec-2 standard AMIs are from *Early February , 2014* and now have
become extremely fragile. For example, when I update a certain library,
ipython would break, or pip would break and so forth. 

Can someone please direct me to a more upto date AMI that I can use with
more confidence. And I am also interested to know what things need to be in
the AMI, if I wanted to build an AMI from scratch (Last resort :( )

And isn't it time to have a ticket in the spark project to build a new suite
of AMIs for the EC2 script? https://issues.apache.org/jira/browse/SPARK-922 

Many thanks
in4maniac 



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/New-Amazon-AMIs-for-EC2-script-tp28419.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: New Amazon AMIs for EC2 script

Posted by Nicholas Chammas <ni...@gmail.com>.
spark-ec2 has moved to GitHub and is no longer part of the Spark project. A
related issue from the current issue tracker that you may want to
follow/comment on is this one: https://github.com/amplab/spark-ec2/issues/74

As I said there, I think requiring custom AMIs is one of the major
maintenance headaches of spark-ec2. I solved this problem in my own
project, Flintrock <https://github.com/nchammas/flintrock>, by working with
the default Amazon Linux AMIs and letting people more freely bring their
own AMI.

Nick


On Thu, Feb 23, 2017 at 7:23 AM in4maniac <sa...@skimlinks.com> wrote:

> Hyy all,
>
> I have been using the EC2 script to launch R&D pyspark clusters for a while
> now. As we use alot of packages such as numpy and scipy with openblas,
> scikit-learn, bokeh, vowpal wabbit, pystan and etc... All this time, we
> have
> been building AMIs on top of the standard spark-AMIs at
> https://github.com/amplab/spark-ec2/tree/branch-1.6/ami-list/us-east-1
>
> Mainly, I have done the following:
> - updated yum
> - Changed the standard python to python 2.7
> - changed pip to 2.7 and installed alot of libararies on top of the
> existing
> AMIs and created my own AMIs to avoid having to boostrap.
>
> But the ec-2 standard AMIs are from *Early February , 2014* and now have
> become extremely fragile. For example, when I update a certain library,
> ipython would break, or pip would break and so forth.
>
> Can someone please direct me to a more upto date AMI that I can use with
> more confidence. And I am also interested to know what things need to be in
> the AMI, if I wanted to build an AMI from scratch (Last resort :( )
>
> And isn't it time to have a ticket in the spark project to build a new
> suite
> of AMIs for the EC2 script?
> https://issues.apache.org/jira/browse/SPARK-922
>
> Many thanks
> in4maniac
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/New-Amazon-AMIs-for-EC2-script-tp28419.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>

Re: New Amazon AMIs for EC2 script

Posted by neil90 <ne...@icloud.com>.
You should look into AWS EMR instead, with adding pip install steps to the
launch process. They have a pretty nice Jupyter notebook script that setups
up jupyter and lets you choose what packages you want to install -
https://aws.amazon.com/blogs/big-data/running-jupyter-notebook-and-jupyterhub-on-amazon-emr/



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/New-Amazon-AMIs-for-EC2-script-tp28419p28421.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org