You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Aureliano Buendia <bu...@gmail.com> on 2014/01/13 05:00:31 UTC

Spark on google compute engine

Hi,

Has anyone worked on a script similar to spark-ec2 for google compute
engine?

Google compute engine claims that they have faster instance start up time,
and that together with by minute charging makes it a desirable choice for
spark.

Re: Spark on google compute engine

Posted by Andre Schumacher <sc...@icsi.berkeley.edu>.
Hi,

Sorry for the late reply, this kind of slipped under my radar.

On 01/13/2014 12:12 PM, Aureliano Buendia wrote:
> On Mon, Jan 13, 2014 at 5:59 PM, Josh Rosen <ro...@gmail.com> wrote:
> 
>> If you'd like to use Spark with Docker, the AMPLab's Docker scripts might
>> be a nice starting point:
>>
> 
> Is it used in production?

Not as far as I know. But I believe that quite a few people have used it
to try out Spark quickly without using any cloud provider. Also someone
apparently used it successfully on Azure:

http://govindkanshi.wordpress.com/2013/12/09/spark-on-azure-using-docker-works/


Note that the scripts assume that the cluster runs on a single fat box.
Before adding inter-node setup inside the scripts (via iptables, route,
etc.) I would wait a bit for the next few Docker releases. I have heard
that there are some exciting things coming up that would make this much
easier to implement.

Andre


>>
>> https://amplab.cs.berkeley.edu/2013/10/23/got-a-minute-spin-up-a-spark-cluster-on-your-laptop-with-docker/
>> https://github.com/amplab/docker-scripts
>>
>>
>> On Mon, Jan 13, 2014 at 8:51 AM, Mayur Rustagi <ma...@gmail.com>wrote:
>>
>>> They are in active development but themselves ask not be used in
>>> production.I can create one to play around I guess.
>>> Regards
>>> Mayur
>>>
>>> Mayur Rustagi
>>> Ph: +919632149971
>>> h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com
>>> https://twitter.com/mayur_rustagi
>>>
>>>
>>>
>>> On Mon, Jan 13, 2014 at 9:40 PM, Aureliano Buendia <bu...@gmail.com>wrote:
>>>
>>>> Why not using docker for spark? GCE recently supports custom kernels
>>>> allowing docker to work on it.
>>>>
>>>> A docket image can be shared between GCE and EC2.
>>>>
>>>>
>>>> On Mon, Jan 13, 2014 at 8:56 AM, Mayur Rustagi <ma...@gmail.com>wrote:
>>>>
>>>>> Hi,
>>>>> I have setup a basic guide here.
>>>>>
>>>>>
>>>>> http://docs.sigmoidanalytics.com/index.php/How_to_Install_Spark_on_Google_Compute_Engine
>>>>>
>>>>> GCE unfortunately doesnt provide a easy way to manage images( It has
>>>>> interface to download the tar.gz image and then reload it, I didnt find it
>>>>> as convinient as AMI). If you create one let me know and I'll update the
>>>>> guide accordingly. If you create a sharable image let me know i'll host it
>>>>> on S3 and make it available for all.
>>>>>
>>>>> Regards
>>>>> Mayur
>>>>>
>>>>>
>>>>> Mayur Rustagi
>>>>> Ph: +919632149971
>>>>> h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com
>>>>> https://twitter.com/mayur_rustagi
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Jan 13, 2014 at 11:01 AM, Debasish Das <
>>>>> debasish.das83@gmail.com> wrote:
>>>>>
>>>>>> Hi Aureliano,
>>>>>>
>>>>>> Look for google compute engine scripts from typesafe repo. They
>>>>>> recently tested Akka Cluster on 2400 nodes from Google Compute Engine. You
>>>>>> should be able to reuse the scripts.
>>>>>>
>>>>>> Thanks.
>>>>>> Deb
>>>>>>
>>>>>>
>>>>>> On Sun, Jan 12, 2014 at 8:00 PM, Aureliano Buendia <
>>>>>> buendia360@gmail.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Has anyone worked on a script similar to spark-ec2 for google compute
>>>>>>> engine?
>>>>>>>
>>>>>>> Google compute engine claims that they have faster instance start up
>>>>>>> time, and that together with by minute charging makes it a desirable choice
>>>>>>> for spark.
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
> 


Re: Spark on google compute engine

Posted by Aureliano Buendia <bu...@gmail.com>.
On Mon, Jan 13, 2014 at 5:59 PM, Josh Rosen <ro...@gmail.com> wrote:

> If you'd like to use Spark with Docker, the AMPLab's Docker scripts might
> be a nice starting point:
>

Is it used in production?


>
>
> https://amplab.cs.berkeley.edu/2013/10/23/got-a-minute-spin-up-a-spark-cluster-on-your-laptop-with-docker/
> https://github.com/amplab/docker-scripts
>
>
> On Mon, Jan 13, 2014 at 8:51 AM, Mayur Rustagi <ma...@gmail.com>wrote:
>
>> They are in active development but themselves ask not be used in
>> production.I can create one to play around I guess.
>> Regards
>> Mayur
>>
>> Mayur Rustagi
>> Ph: +919632149971
>> h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com
>> https://twitter.com/mayur_rustagi
>>
>>
>>
>> On Mon, Jan 13, 2014 at 9:40 PM, Aureliano Buendia <bu...@gmail.com>wrote:
>>
>>> Why not using docker for spark? GCE recently supports custom kernels
>>> allowing docker to work on it.
>>>
>>> A docket image can be shared between GCE and EC2.
>>>
>>>
>>> On Mon, Jan 13, 2014 at 8:56 AM, Mayur Rustagi <ma...@gmail.com>wrote:
>>>
>>>> Hi,
>>>> I have setup a basic guide here.
>>>>
>>>>
>>>> http://docs.sigmoidanalytics.com/index.php/How_to_Install_Spark_on_Google_Compute_Engine
>>>>
>>>> GCE unfortunately doesnt provide a easy way to manage images( It has
>>>> interface to download the tar.gz image and then reload it, I didnt find it
>>>> as convinient as AMI). If you create one let me know and I'll update the
>>>> guide accordingly. If you create a sharable image let me know i'll host it
>>>> on S3 and make it available for all.
>>>>
>>>> Regards
>>>> Mayur
>>>>
>>>>
>>>> Mayur Rustagi
>>>> Ph: +919632149971
>>>> h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com
>>>> https://twitter.com/mayur_rustagi
>>>>
>>>>
>>>>
>>>> On Mon, Jan 13, 2014 at 11:01 AM, Debasish Das <
>>>> debasish.das83@gmail.com> wrote:
>>>>
>>>>> Hi Aureliano,
>>>>>
>>>>> Look for google compute engine scripts from typesafe repo. They
>>>>> recently tested Akka Cluster on 2400 nodes from Google Compute Engine. You
>>>>> should be able to reuse the scripts.
>>>>>
>>>>> Thanks.
>>>>> Deb
>>>>>
>>>>>
>>>>> On Sun, Jan 12, 2014 at 8:00 PM, Aureliano Buendia <
>>>>> buendia360@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Has anyone worked on a script similar to spark-ec2 for google compute
>>>>>> engine?
>>>>>>
>>>>>> Google compute engine claims that they have faster instance start up
>>>>>> time, and that together with by minute charging makes it a desirable choice
>>>>>> for spark.
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Spark on google compute engine

Posted by Josh Rosen <ro...@gmail.com>.
If you'd like to use Spark with Docker, the AMPLab's Docker scripts might
be a nice starting point:

https://amplab.cs.berkeley.edu/2013/10/23/got-a-minute-spin-up-a-spark-cluster-on-your-laptop-with-docker/
https://github.com/amplab/docker-scripts


On Mon, Jan 13, 2014 at 8:51 AM, Mayur Rustagi <ma...@gmail.com>wrote:

> They are in active development but themselves ask not be used in
> production.I can create one to play around I guess.
> Regards
> Mayur
>
> Mayur Rustagi
> Ph: +919632149971
> h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com
> https://twitter.com/mayur_rustagi
>
>
>
> On Mon, Jan 13, 2014 at 9:40 PM, Aureliano Buendia <bu...@gmail.com>wrote:
>
>> Why not using docker for spark? GCE recently supports custom kernels
>> allowing docker to work on it.
>>
>> A docket image can be shared between GCE and EC2.
>>
>>
>> On Mon, Jan 13, 2014 at 8:56 AM, Mayur Rustagi <ma...@gmail.com>wrote:
>>
>>> Hi,
>>> I have setup a basic guide here.
>>>
>>>
>>> http://docs.sigmoidanalytics.com/index.php/How_to_Install_Spark_on_Google_Compute_Engine
>>>
>>> GCE unfortunately doesnt provide a easy way to manage images( It has
>>> interface to download the tar.gz image and then reload it, I didnt find it
>>> as convinient as AMI). If you create one let me know and I'll update the
>>> guide accordingly. If you create a sharable image let me know i'll host it
>>> on S3 and make it available for all.
>>>
>>> Regards
>>> Mayur
>>>
>>>
>>> Mayur Rustagi
>>> Ph: +919632149971
>>> h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com
>>> https://twitter.com/mayur_rustagi
>>>
>>>
>>>
>>> On Mon, Jan 13, 2014 at 11:01 AM, Debasish Das <debasish.das83@gmail.com
>>> > wrote:
>>>
>>>> Hi Aureliano,
>>>>
>>>> Look for google compute engine scripts from typesafe repo. They
>>>> recently tested Akka Cluster on 2400 nodes from Google Compute Engine. You
>>>> should be able to reuse the scripts.
>>>>
>>>> Thanks.
>>>> Deb
>>>>
>>>>
>>>> On Sun, Jan 12, 2014 at 8:00 PM, Aureliano Buendia <
>>>> buendia360@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Has anyone worked on a script similar to spark-ec2 for google compute
>>>>> engine?
>>>>>
>>>>> Google compute engine claims that they have faster instance start up
>>>>> time, and that together with by minute charging makes it a desirable choice
>>>>> for spark.
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Spark on google compute engine

Posted by Mayur Rustagi <ma...@gmail.com>.
They are in active development but themselves ask not be used in
production.I can create one to play around I guess.
Regards
Mayur

Mayur Rustagi
Ph: +919632149971
h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com
https://twitter.com/mayur_rustagi



On Mon, Jan 13, 2014 at 9:40 PM, Aureliano Buendia <bu...@gmail.com>wrote:

> Why not using docker for spark? GCE recently supports custom kernels
> allowing docker to work on it.
>
> A docket image can be shared between GCE and EC2.
>
>
> On Mon, Jan 13, 2014 at 8:56 AM, Mayur Rustagi <ma...@gmail.com>wrote:
>
>> Hi,
>> I have setup a basic guide here.
>>
>>
>> http://docs.sigmoidanalytics.com/index.php/How_to_Install_Spark_on_Google_Compute_Engine
>>
>> GCE unfortunately doesnt provide a easy way to manage images( It has
>> interface to download the tar.gz image and then reload it, I didnt find it
>> as convinient as AMI). If you create one let me know and I'll update the
>> guide accordingly. If you create a sharable image let me know i'll host it
>> on S3 and make it available for all.
>>
>> Regards
>> Mayur
>>
>>
>> Mayur Rustagi
>> Ph: +919632149971
>> h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com
>> https://twitter.com/mayur_rustagi
>>
>>
>>
>> On Mon, Jan 13, 2014 at 11:01 AM, Debasish Das <de...@gmail.com>wrote:
>>
>>> Hi Aureliano,
>>>
>>> Look for google compute engine scripts from typesafe repo. They recently
>>> tested Akka Cluster on 2400 nodes from Google Compute Engine. You should be
>>> able to reuse the scripts.
>>>
>>> Thanks.
>>> Deb
>>>
>>>
>>> On Sun, Jan 12, 2014 at 8:00 PM, Aureliano Buendia <buendia360@gmail.com
>>> > wrote:
>>>
>>>> Hi,
>>>>
>>>> Has anyone worked on a script similar to spark-ec2 for google compute
>>>> engine?
>>>>
>>>> Google compute engine claims that they have faster instance start up
>>>> time, and that together with by minute charging makes it a desirable choice
>>>> for spark.
>>>>
>>>
>>>
>>
>

Re: Spark on google compute engine

Posted by "Evan R. Sparks" <ev...@gmail.com>.
Andre Schumacher built one as part of a blog post in October -
https://amplab.cs.berkeley.edu/2013/10/23/got-a-minute-spin-up-a-spark-cluster-on-your-laptop-with-docker/


On Mon, Jan 13, 2014 at 9:09 AM, Xuefeng Wu <be...@gmail.com> wrote:

> It is really a good idea to build a docker container for spark.
>
>
> Yours, Xuefeng Wu 吴雪峰 敬上
>
> On 2014年1月14日, at 上午12:10, Aureliano Buendia <bu...@gmail.com> wrote:
>
> Why not using docker for spark? GCE recently supports custom kernels
> allowing docker to work on it.
>
> A docket image can be shared between GCE and EC2.
>
>
> On Mon, Jan 13, 2014 at 8:56 AM, Mayur Rustagi <ma...@gmail.com>wrote:
>
>> Hi,
>> I have setup a basic guide here.
>>
>>
>> http://docs.sigmoidanalytics.com/index.php/How_to_Install_Spark_on_Google_Compute_Engine
>>
>> GCE unfortunately doesnt provide a easy way to manage images( It has
>> interface to download the tar.gz image and then reload it, I didnt find it
>> as convinient as AMI). If you create one let me know and I'll update the
>> guide accordingly. If you create a sharable image let me know i'll host it
>> on S3 and make it available for all.
>>
>> Regards
>> Mayur
>>
>>
>> Mayur Rustagi
>> Ph: +919632149971
>> h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com
>> https://twitter.com/mayur_rustagi
>>
>>
>>
>> On Mon, Jan 13, 2014 at 11:01 AM, Debasish Das <de...@gmail.com>wrote:
>>
>>> Hi Aureliano,
>>>
>>> Look for google compute engine scripts from typesafe repo. They recently
>>> tested Akka Cluster on 2400 nodes from Google Compute Engine. You should be
>>> able to reuse the scripts.
>>>
>>> Thanks.
>>> Deb
>>>
>>>
>>> On Sun, Jan 12, 2014 at 8:00 PM, Aureliano Buendia <buendia360@gmail.com
>>> > wrote:
>>>
>>>> Hi,
>>>>
>>>> Has anyone worked on a script similar to spark-ec2 for google compute
>>>> engine?
>>>>
>>>> Google compute engine claims that they have faster instance start up
>>>> time, and that together with by minute charging makes it a desirable choice
>>>> for spark.
>>>>
>>>
>>>
>>
>

Re: Spark on google compute engine

Posted by Xuefeng Wu <be...@gmail.com>.
It is really a good idea to build a docker container for spark. 


Yours, Xuefeng Wu 吴雪峰 敬上

> On 2014年1月14日, at 上午12:10, Aureliano Buendia <bu...@gmail.com> wrote:
> 
> Why not using docker for spark? GCE recently supports custom kernels allowing docker to work on it.
> 
> A docket image can be shared between GCE and EC2.
> 
> 
>> On Mon, Jan 13, 2014 at 8:56 AM, Mayur Rustagi <ma...@gmail.com> wrote:
>> Hi,
>> I have setup a basic guide here.
>> 
>> http://docs.sigmoidanalytics.com/index.php/How_to_Install_Spark_on_Google_Compute_Engine
>> 
>> GCE unfortunately doesnt provide a easy way to manage images( It has interface to download the tar.gz image and then reload it, I didnt find it as convinient as AMI). If you create one let me know and I'll update the guide accordingly. If you create a sharable image let me know i'll host it on S3 and make it available for all. 
>> 
>> Regards
>> Mayur
>> 
>> 
>> Mayur Rustagi
>> Ph: +919632149971
>> http://www.sigmoidanalytics.com
>> https://twitter.com/mayur_rustagi
>> 
>> 
>> 
>>> On Mon, Jan 13, 2014 at 11:01 AM, Debasish Das <de...@gmail.com> wrote:
>>> Hi Aureliano,
>>> 
>>> Look for google compute engine scripts from typesafe repo. They recently tested Akka Cluster on 2400 nodes from Google Compute Engine. You should be able to reuse the scripts.
>>> 
>>> Thanks.
>>> Deb
>>> 
>>> 
>>>> On Sun, Jan 12, 2014 at 8:00 PM, Aureliano Buendia <bu...@gmail.com> wrote:
>>>> Hi,
>>>> 
>>>> Has anyone worked on a script similar to spark-ec2 for google compute engine?
>>>> 
>>>> Google compute engine claims that they have faster instance start up time, and that together with by minute charging makes it a desirable choice for spark.
> 

Re: Spark on google compute engine

Posted by Aureliano Buendia <bu...@gmail.com>.
Why not using docker for spark? GCE recently supports custom kernels
allowing docker to work on it.

A docket image can be shared between GCE and EC2.


On Mon, Jan 13, 2014 at 8:56 AM, Mayur Rustagi <ma...@gmail.com>wrote:

> Hi,
> I have setup a basic guide here.
>
>
> http://docs.sigmoidanalytics.com/index.php/How_to_Install_Spark_on_Google_Compute_Engine
>
> GCE unfortunately doesnt provide a easy way to manage images( It has
> interface to download the tar.gz image and then reload it, I didnt find it
> as convinient as AMI). If you create one let me know and I'll update the
> guide accordingly. If you create a sharable image let me know i'll host it
> on S3 and make it available for all.
>
> Regards
> Mayur
>
>
> Mayur Rustagi
> Ph: +919632149971
> h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com
> https://twitter.com/mayur_rustagi
>
>
>
> On Mon, Jan 13, 2014 at 11:01 AM, Debasish Das <de...@gmail.com>wrote:
>
>> Hi Aureliano,
>>
>> Look for google compute engine scripts from typesafe repo. They recently
>> tested Akka Cluster on 2400 nodes from Google Compute Engine. You should be
>> able to reuse the scripts.
>>
>> Thanks.
>> Deb
>>
>>
>> On Sun, Jan 12, 2014 at 8:00 PM, Aureliano Buendia <bu...@gmail.com>wrote:
>>
>>> Hi,
>>>
>>> Has anyone worked on a script similar to spark-ec2 for google compute
>>> engine?
>>>
>>> Google compute engine claims that they have faster instance start up
>>> time, and that together with by minute charging makes it a desirable choice
>>> for spark.
>>>
>>
>>
>

Re: Spark on google compute engine

Posted by Mayur Rustagi <ma...@gmail.com>.
Hi,
I have setup a basic guide here.

http://docs.sigmoidanalytics.com/index.php/How_to_Install_Spark_on_Google_Compute_Engine

GCE unfortunately doesnt provide a easy way to manage images( It has
interface to download the tar.gz image and then reload it, I didnt find it
as convinient as AMI). If you create one let me know and I'll update the
guide accordingly. If you create a sharable image let me know i'll host it
on S3 and make it available for all.

Regards
Mayur


Mayur Rustagi
Ph: +919632149971
h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com
https://twitter.com/mayur_rustagi



On Mon, Jan 13, 2014 at 11:01 AM, Debasish Das <de...@gmail.com>wrote:

> Hi Aureliano,
>
> Look for google compute engine scripts from typesafe repo. They recently
> tested Akka Cluster on 2400 nodes from Google Compute Engine. You should be
> able to reuse the scripts.
>
> Thanks.
> Deb
>
>
> On Sun, Jan 12, 2014 at 8:00 PM, Aureliano Buendia <bu...@gmail.com>wrote:
>
>> Hi,
>>
>> Has anyone worked on a script similar to spark-ec2 for google compute
>> engine?
>>
>> Google compute engine claims that they have faster instance start up
>> time, and that together with by minute charging makes it a desirable choice
>> for spark.
>>
>
>

Re: Spark on google compute engine

Posted by Debasish Das <de...@gmail.com>.
Hi Aureliano,

Look for google compute engine scripts from typesafe repo. They recently
tested Akka Cluster on 2400 nodes from Google Compute Engine. You should be
able to reuse the scripts.

Thanks.
Deb


On Sun, Jan 12, 2014 at 8:00 PM, Aureliano Buendia <bu...@gmail.com>wrote:

> Hi,
>
> Has anyone worked on a script similar to spark-ec2 for google compute
> engine?
>
> Google compute engine claims that they have faster instance start up time,
> and that together with by minute charging makes it a desirable choice for
> spark.
>