You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by guxiaobo1982 <gu...@qq.com> on 2014/01/02 05:50:30 UTC

Re: Reply: Reply: Any best practice for hardware configuration forthemasterserver in standalone cluster mode?

0.8.1 of Spark is released now , do you mean we can share cached RDDs using this version now?

 ------------------ Original ------------------
  From:  "Sriram Ramachandrasekaran"<sr...@gmail.com>;
 Date:  Jan 2, 2014
 To:  "user"<us...@spark.incubator.apache.org>; 

 Subject:  Re: Reply: Reply: Any best practice for hardware configuration forthemasterserver in standalone cluster mode?

 Yes the driver would run on the machine from which you launch your spark job. As for sharing cached RDDs, I don't think it's possible up until 0.8.1. The RDDs are not available across spark contexts, if my understanding is right.  

 If you still want to share RDDs, then you might have write a single service that maintains the cached RDD and the various other apps that want to access that RDD talk to that service. If I understand right, Shark handles SQL queries like this.

 On Tue, Dec 31, 2013 at 7:46 PM, guxiaobo1982 <gu...@qq.com> wrote:
  We have different developers sharing a Spark cluster, and we don't let developers touch the master server. Each of the developers will commit their application from their desktop, then does each driver run on their desktops?

 Buy the way, can developers share cached RDDs.

 ------------------ Original ------------------
  Sender: "Mayur Rustagi"<ma...@gmail.com>;
 Send time: Tuesday, Dec 31, 2013 10:11 PM
 To: "user"<us...@spark.incubator.apache.org>; 

 Subject: Re: Reply: Any best practice for hardware configuration for themasterserver in standalone cluster mode?

 Driver is the process that manages the execution across the cluster. So say your application is a "sql query" then the system spawns a shark-cli-driver that uses spark framework, hdfs etc to execute the query and deliver result. All this happens automatically so you dont need to worry about it as a user of spark/shark framework. Just go for a bigger machine with a master.  

  Mayur Rustagi
Ph: +919632149971  http://www.sigmoidanalytics.com
  https://twitter.com/mayur_rustagi

 On Tue, Dec 31, 2013 at 7:01 PM, guxiaobo1982 <gu...@qq.com> wrote:
  Thanks for your reply, I am new hand at Spark, does driver mean the server where user applications are commit?

 ------------------ Original ------------------
  Sender: "Mayur Rustagi"<ma...@gmail.com>;
 Send time: Tuesday, Dec 31, 2013 9:55 PM
 To: "user"<us...@spark.incubator.apache.org>; 

 Subject: Re: Any best practice for hardware configuration for the masterserver in standalone cluster mode?

 Master server needs to be a little beefy as the driver runs on it. We ran into some issues around scaling due to master servers. You can offload the drivers to workers or other machines then the master server can be smaller.  Regards
Mayur

  Mayur Rustagi
Ph: +919632149971  http://www.sigmoidanalytics.com
  https://twitter.com/mayur_rustagi

 On Tue, Dec 31, 2013 at 6:48 PM, guxiaobo1982 <gu...@qq.com> wrote:
  Him

 I read the following article regarding to hardware configurations for the worker servers in the standalone cluster mode, but what about the master server?

 http://spark.incubator.apache.org/docs/latest/hardware-provisioning.html

 Regards,

 Xiaobo Gu

-- 
It's just about how deep your longing is!

Re: 答复: Reply: Reply: Any best practice for hardware configuration forthemasterserver in standalone cluster mode?

Posted by Ashish Rangole <ar...@gmail.com>.

One can take a look into Tacheyon project to share the RDDs across various
Spark contexts.
On Jan 1, 2014 10:55 PM, "jasonliu" <ja...@gmail.com> wrote:

> Actually, we can’t even in 0.8.1
>
>
>
> *发件人:* guxiaobo1982 [mailto:guxiaobo1982@qq.com]
> *发送时间:* 2014年1月2日 12:51
> *收件人:* user
> *主题:* Re: Reply: Reply: Any best practice for hardware configuration
> forthemasterserver in standalone cluster mode?
>
>
>
> 0.8.1 of Spark is released now , do you mean we can share cached RDDs
> using this version now?
>
>
>
>
>
> ------------------ Original ------------------
>
> *From: * "Sriram Ramachandrasekaran"<sr...@gmail.com>;
>
> *Date: * Jan 2, 2014
>
> *To: * "user"<us...@spark.incubator.apache.org>;
>
> *Subject: * Re: Reply: Reply: Any best practice for hardware
> configuration forthemasterserver in standalone cluster mode?
>
>
>
> Yes the driver would run on the machine from which you launch your spark
> job. As for sharing cached RDDs, I don't think it's possible up until
> 0.8.1. The RDDs are not available across spark contexts, if my
> understanding is right.
>
>
>
> If you still want to share RDDs, then you might have write a single
> service that maintains the cached RDD and the various other apps that want
> to access that RDD talk to that service. If I understand right, Shark
> handles SQL queries like this.
>
>
>
> On Tue, Dec 31, 2013 at 7:46 PM, guxiaobo1982 <gu...@qq.com> wrote:
>
> We have different developers sharing a Spark cluster, and we don't let
> developers touch the master server. Each of the developers will commit
> their application from their desktop, then does each driver run on their
> desktops?
>
> Buy the way, can developers share cached RDDs.
>
>
>
>
>
> ------------------ Original ------------------
>
> *Sender:* "Mayur Rustagi"<ma...@gmail.com>;
>
> *Send time:* Tuesday, Dec 31, 2013 10:11 PM
>
> *To:* "user"<us...@spark.incubator.apache.org>;
>
> *Subject:* Re: Reply: Any best practice for hardware configuration for
> themasterserver in standalone cluster mode?
>
>
>
> Driver is the process that manages the execution across the cluster. So
> say your application is a "sql query" then the system spawns a
> shark-cli-driver that uses spark framework, hdfs etc to execute the query
> and deliver result. All this happens automatically so you dont need to
> worry about it as a user of spark/shark framework. Just go for a bigger
> machine with a master.
>
>
>
>
> Mayur Rustagi
> Ph: +919632149971
>
> h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com
>
> https://twitter.com/mayur_rustagi
>
>
>
>
>
> On Tue, Dec 31, 2013 at 7:01 PM, guxiaobo1982 <gu...@qq.com> wrote:
>
> Thanks for your reply, I am new hand at Spark, does driver mean the server
> where user applications are commit?
>
>
>
>
>
>
>
> ------------------ Original ------------------
>
> *Sender:* "Mayur Rustagi"<ma...@gmail.com>;
>
> *Send time:* Tuesday, Dec 31, 2013 9:55 PM
>
> *To:* "user"<us...@spark.incubator.apache.org>;
>
> *Subject:* Re: Any best practice for hardware configuration for the
> masterserver in standalone cluster mode?
>
>
>
> Master server needs to be a little beefy as the driver runs on it. We ran
> into some issues around scaling due to master servers. You can offload the
> drivers to workers or other machines then the master server can be smaller.
>
> Regards
> Mayur
>
>
> Mayur Rustagi
> Ph: +919632149971
>
> h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com
>
> https://twitter.com/mayur_rustagi
>
>
>
>
>
> On Tue, Dec 31, 2013 at 6:48 PM, guxiaobo1982 <gu...@qq.com> wrote:
>
> Him
>
>
>
> I read the following article regarding to hardware configurations for the
> worker servers in the standalone cluster mode, but what about the master
> server?
>
>
>
> http://spark.incubator.apache.org/docs/latest/hardware-provisioning.html
>
>
>
>
>
> Regards,
>
>
>
> Xiaobo Gu
>
>
>
>
>
>
>
>
>
>
>
> --
> It's just about how deep your longing is!
>

答复: Reply: Reply: Any best practice for hardware configuration forthemasterserver in standalone cluster mode?

Posted by jasonliu <ja...@gmail.com>.

Actually, we can’t even in 0.8.1

 

发件人: guxiaobo1982 [mailto:guxiaobo1982@qq.com] 
发送时间: 2014年1月2日 12:51
收件人: user
主题: Re: Reply: Reply: Any best practice for hardware configuration
forthemasterserver in standalone cluster mode?

 

0.8.1 of Spark is released now , do you mean we can share cached RDDs using
this version now?

 

 

------------------ Original ------------------

From:  "Sriram Ramachandrasekaran"<sr...@gmail.com>;

Date:  Jan 2, 2014

To:  "user"<us...@spark.incubator.apache.org>; 

Subject:  Re: Reply: Reply: Any best practice for hardware configuration
forthemasterserver in standalone cluster mode?

 

Yes the driver would run on the machine from which you launch your spark
job. As for sharing cached RDDs, I don't think it's possible up until 0.8.1.
The RDDs are not available across spark contexts, if my understanding is
right. 

 

If you still want to share RDDs, then you might have write a single service
that maintains the cached RDD and the various other apps that want to access
that RDD talk to that service. If I understand right, Shark handles SQL
queries like this.

 

On Tue, Dec 31, 2013 at 7:46 PM, guxiaobo1982 <gu...@qq.com> wrote:

We have different developers sharing a Spark cluster, and we don't let
developers touch the master server. Each of the developers will commit their
application from their desktop, then does each driver run on their desktops?

Buy the way, can developers share cached RDDs.

 

 

------------------ Original ------------------

Sender: "Mayur Rustagi"<ma...@gmail.com>;

Send time: Tuesday, Dec 31, 2013 10:11 PM

To: "user"<us...@spark.incubator.apache.org>; 

Subject: Re: Reply: Any best practice for hardware configuration for
themasterserver in standalone cluster mode?

 

Driver is the process that manages the execution across the cluster. So say
your application is a "sql query" then the system spawns a shark-cli-driver
that uses spark framework, hdfs etc to execute the query and deliver result.
All this happens automatically so you dont need to worry about it as a user
of spark/shark framework. Just go for a bigger machine with a master. 

 




Mayur Rustagi
Ph: +919632149971 

h <https://twitter.com/mayur_rustagi> ttp://www.sigmoidanalytics.com
<http://www.sigmoidanalytics.com/> 

https://twitter.com/mayur_rustagi

 

 

On Tue, Dec 31, 2013 at 7:01 PM, guxiaobo1982 <gu...@qq.com> wrote:

Thanks for your reply, I am new hand at Spark, does driver mean the server
where user applications are commit?

 

 

 

------------------ Original ------------------

Sender: "Mayur Rustagi"<ma...@gmail.com>;

Send time: Tuesday, Dec 31, 2013 9:55 PM

To: "user"<us...@spark.incubator.apache.org>; 

Subject: Re: Any best practice for hardware configuration for the
masterserver in standalone cluster mode?

 

Master server needs to be a little beefy as the driver runs on it. We ran
into some issues around scaling due to master servers. You can offload the
drivers to workers or other machines then the master server can be smaller. 

Regards
Mayur




Mayur Rustagi
Ph: +919632149971 

h <https://twitter.com/mayur_rustagi> ttp://www.sigmoidanalytics.com
<http://www.sigmoidanalytics.com/> 

https://twitter.com/mayur_rustagi

 

 

On Tue, Dec 31, 2013 at 6:48 PM, guxiaobo1982 <gu...@qq.com> wrote:

Him

 

I read the following article regarding to hardware configurations for the
worker servers in the standalone cluster mode, but what about the master
server?

 

http://spark.incubator.apache.org/docs/latest/hardware-provisioning.html

 

 

Regards,

 

Xiaobo Gu

 

 

 





 

-- 
It's just about how deep your longing is!