You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Sai Prasanna <an...@gmail.com> on 2014/03/26 13:54:38 UTC

Distributed running in Spark Interactive shell

Is it possible to run across cluster using Spark Interactive Shell ?

To be more explicit, is the procedure similar to running standalone
master-slave spark.

I want to execute my code in  the interactive shell in the master-node, and
it should run across the cluster [say 5 node]. Is the procedure similar ???





-- 
*Sai Prasanna. AN*
*II M.Tech (CS), SSSIHL*


*Entire water in the ocean can never sink a ship, Unless it gets inside.All
the pressures of life can never hurt you, Unless you let them in.*

Re: Distributed running in Spark Interactive shell

Posted by Sai Prasanna <an...@gmail.com>.

Thanks Chen, its a bit clear now and it up and running...

1) In the WebUI, only memory used per node is given. Though in logs i can
find out, but does there exist a port over which i can monitor memory
usage, GC memory overhead, RDD creation in UI.

Re: Distributed running in Spark Interactive shell

Posted by giive chen <th...@gmail.com>.

This response is for Sai

The easiest way to verify your current Spark-Shell setting is just type
"sc.master"

IF your setting is correct, it should return
scala> sc.master
res0: String = spark://master.ip.url.com:5050

If your SPARK_MASTER_IP is not correct setting, it will response
scala> sc.master
res0: String = local

That means your spark-shell is running on local mode.

You can also check on Spark master's web ui. You should have a Spark-Shell
program running on master's application list.

Wisely Chen







On Wed, Mar 26, 2014 at 10:12 PM, Nan Zhu <zh...@gmail.com> wrote:

>  and, yes, I think that picture is a bit misleading, though in the
> following paragraph it has mentioned that
>
> "
> Because the driver *schedules* tasks on the cluster, it should be run
> close to the worker nodes, preferably on the same local area network. If
> you'd like to send requests to the cluster remotely, it's better to open an
> RPC to the driver and have it submit operations from nearby than to run a
> driver far away from the worker nodes.
> "
>
> --
> Nan Zhu
>
> On Wednesday, March 26, 2014 at 9:59 AM, Nan Zhu wrote:
>
>  master does more work than that actually, I just explained why he should
> set MASTER_IP correctly
>
> a simplified list:
>
> 1. maintain the  worker status
>
> 2. maintain in-cluster driver status
>
> 3. maintain executor status (the worker tells master what happened on the
> executor,
>
>
>
> --
> Nan Zhu
>
>
> On Wednesday, March 26, 2014 at 9:46 AM, Yana Kadiyska wrote:
>
> Nan (or anyone who feels they understand the cluster architecture well),
> can you clarify something for me.
>
> From reading this user group and your explanation above it appears that
> the cluster master is only involved in this during application startup --
> to allocate executors(from what you wrote sounds like the driver itself
> passes the job/tasks to  the executors). From there onwards all computation
> is done on the executors, who communicate results directly to the driver if
> certain actions (say collect) are performed. Is that right? The only
> description of the cluster I've seen came from here:
> https://spark.apache.org/docs/0.9.0/cluster-overview.html but that
> picture suggests there is no direct communication between driver and
> executors, which I believe is wrong (unless I am misreading the picture --
> I believe Master and "Cluster Manager" refer to the same thing?).
>
> The very short form of my question is, does the master do anything other
> than executor allocation?
>
>
> On Wed, Mar 26, 2014 at 9:23 AM, Nan Zhu <zh...@gmail.com> wrote:
>
>  what you only need to do is ensure your spark cluster is running well,
> (you can check by access the Spark UI to see if all workers are displayed)
>
> then, you have to set correct SPARK_MASTER_IP in the machine where you run
> spark-shell
>
> The more details are :
>
> when you run bin/spark-shell, it will start the driver program in that
> machine, interacting with the Master to start the application (in this
> case, it is spark-shell)
>
> the Master tells Workers to start executors for your application, and the
> executors will try to register with your driver,
>
> then your driver can distribute tasks to the executors, i.e. run in a
> distributed fashion
>
>
> Best,
>
> --
> Nan Zhu
>
> On Wednesday, March 26, 2014 at 9:01 AM, Sai Prasanna wrote:
>
> Nan Zhu, its the later, I want to distribute the tasks to the cluster
> [machines available.]
>
> If i set the SPARK_MASTER_IP at the other machines and set the slaves-IP
> in the /conf/slaves at the master node, will the interactive shell code run
> at the master get distributed across multiple machines ???
>
>
>
>
>
> On Wed, Mar 26, 2014 at 6:32 PM, Nan Zhu <zh...@gmail.com> wrote:
>
>  what do you mean by run across the cluster?
>
> you want to start the spark-shell across the cluster or you want to
> distribute tasks to multiple machines?
>
> if the former case, yes, as long as you indicate the right master URL
>
> if the later case, also yes, you can observe the distributed task in the
> Spark UI
>
> --
> Nan Zhu
>
> On Wednesday, March 26, 2014 at 8:54 AM, Sai Prasanna wrote:
>
> Is it possible to run across cluster using Spark Interactive Shell ?
>
> To be more explicit, is the procedure similar to running standalone
> master-slave spark.
>
> I want to execute my code in  the interactive shell in the master-node,
> and it should run across the cluster [say 5 node]. Is the procedure similar
> ???
>
>
>
>
>
> --
> *Sai Prasanna. AN*
> *II M.Tech (CS), SSSIHL*
>
>
> *Entire water in the ocean can never sink a ship, Unless it gets inside.
> All the pressures of life can never hurt you, Unless you let them in.*
>
>
>
>
>
> --
> *Sai Prasanna. AN*
> *II M.Tech (CS), SSSIHL*
>
>
> *Entire water in the ocean can never sink a ship, Unless it gets inside.
> All the pressures of life can never hurt you, Unless you let them in.*
>
>
>
>
>
>

Re: Distributed running in Spark Interactive shell

Posted by Nan Zhu <zh...@gmail.com>.

and, yes, I think that picture is a bit misleading, though in the following paragraph it has mentioned that  

“
Because the driver schedules tasks on the cluster, it should be run close to the worker nodes, preferably on the same local area network. If you’d like to send requests to the cluster remotely, it’s better to open an RPC to the driver and have it submit operations from nearby than to run a driver far away from the worker nodes.
"

--  
Nan Zhu


On Wednesday, March 26, 2014 at 9:59 AM, Nan Zhu wrote:

> master does more work than that actually, I just explained why he should set MASTER_IP correctly
>  
> a simplified list:
>  
> 1. maintain the  worker status
>  
> 2. maintain in-cluster driver status
>  
> 3. maintain executor status (the worker tells master what happened on the executor,  
>  
>  
>  
> --  
> Nan Zhu
>  
>  
>  
> On Wednesday, March 26, 2014 at 9:46 AM, Yana Kadiyska wrote:
>  
> > Nan (or anyone who feels they understand the cluster architecture well), can you clarify something for me.  
> >  
> > From reading this user group and your explanation above it appears that the cluster master is only involved in this during application startup -- to allocate executors(from what you wrote sounds like the driver itself passes the job/tasks to  the executors). From there onwards all computation is done on the executors, who communicate results directly to the driver if certain actions (say collect) are performed. Is that right? The only description of the cluster I've seen came from here: https://spark.apache.org/docs/0.9.0/cluster-overview.html but that picture suggests there is no direct communication between driver and executors, which I believe is wrong (unless I am misreading the picture -- I believe Master and "Cluster Manager" refer to the same thing?).  
> >  
> > The very short form of my question is, does the master do anything other than executor allocation?
> >  
> >  
> > On Wed, Mar 26, 2014 at 9:23 AM, Nan Zhu <zhunanmcgill@gmail.com (mailto:zhunanmcgill@gmail.com)> wrote:
> > > what you only need to do is ensure your spark cluster is running well, (you can check by access the Spark UI to see if all workers are displayed)
> > >  
> > > then, you have to set correct SPARK_MASTER_IP in the machine where you run spark-shell  
> > >  
> > > The more details are :
> > >  
> > > when you run bin/spark-shell, it will start the driver program in that machine, interacting with the Master to start the application (in this case, it is spark-shell)  
> > >  
> > > the Master tells Workers to start executors for your application, and the executors will try to register with your driver,  
> > >  
> > > then your driver can distribute tasks to the executors, i.e. run in a distributed fashion  
> > >  
> > >  
> > > Best,  
> > >  
> > > --  
> > > Nan Zhu
> > >  
> > >  
> > > On Wednesday, March 26, 2014 at 9:01 AM, Sai Prasanna wrote:
> > >  
> > > > Nan Zhu, its the later, I want to distribute the tasks to the cluster [machines available.]
> > > >  
> > > > If i set the SPARK_MASTER_IP at the other machines and set the slaves-IP in the /conf/slaves at the master node, will the interactive shell code run at the master get distributed across multiple machines ???  
> > > >  
> > > >  
> > > >   
> > > >  
> > > >  
> > > > On Wed, Mar 26, 2014 at 6:32 PM, Nan Zhu <zhunanmcgill@gmail.com (mailto:zhunanmcgill@gmail.com)> wrote:
> > > > > what do you mean by run across the cluster?  
> > > > >  
> > > > > you want to start the spark-shell across the cluster or you want to distribute tasks to multiple machines?
> > > > >  
> > > > > if the former case, yes, as long as you indicate the right master URL  
> > > > >  
> > > > > if the later case, also yes, you can observe the distributed task in the Spark UI  
> > > > >  
> > > > > --  
> > > > > Nan Zhu
> > > > >  
> > > > >  
> > > > > On Wednesday, March 26, 2014 at 8:54 AM, Sai Prasanna wrote:
> > > > >  
> > > > > > Is it possible to run across cluster using Spark Interactive Shell ?
> > > > > >  
> > > > > > To be more explicit, is the procedure similar to running standalone master-slave spark.  
> > > > > >  
> > > > > > I want to execute my code in  the interactive shell in the master-node, and it should run across the cluster [say 5 node]. Is the procedure similar ???
> > > > > >  
> > > > > >  
> > > > > >  
> > > > > >  
> > > > > >  
> > > > > > --  
> > > > > > Sai Prasanna. AN
> > > > > > II M.Tech (CS), SSSIHL
> > > > > >  
> > > > > > Entire water in the ocean can never sink a ship, Unless it gets inside.
> > > > > > All the pressures of life can never hurt you, Unless you let them in.
> > > > >  
> > > >  
> > > >  
> > > >  
> > > > --  
> > > > Sai Prasanna. AN
> > > > II M.Tech (CS), SSSIHL
> > > >  
> > > > Entire water in the ocean can never sink a ship, Unless it gets inside.
> > > > All the pressures of life can never hurt you, Unless you let them in.
> > >  
> >  
>

Re: Distributed running in Spark Interactive shell

Posted by Nan Zhu <zh...@gmail.com>.

master does more work than that actually, I just explained why he should set MASTER_IP correctly

a simplified list:

1. maintain the  worker status

2. maintain in-cluster driver status

3. maintain executor status (the worker tells master what happened on the executor, 



-- 
Nan Zhu



On Wednesday, March 26, 2014 at 9:46 AM, Yana Kadiyska wrote:

> Nan (or anyone who feels they understand the cluster architecture well), can you clarify something for me. 
> 
> From reading this user group and your explanation above it appears that the cluster master is only involved in this during application startup -- to allocate executors(from what you wrote sounds like the driver itself passes the job/tasks to  the executors). From there onwards all computation is done on the executors, who communicate results directly to the driver if certain actions (say collect) are performed. Is that right? The only description of the cluster I've seen came from here: https://spark.apache.org/docs/0.9.0/cluster-overview.html but that picture suggests there is no direct communication between driver and executors, which I believe is wrong (unless I am misreading the picture -- I believe Master and "Cluster Manager" refer to the same thing?). 
> 
> The very short form of my question is, does the master do anything other than executor allocation?
> 
> 
> On Wed, Mar 26, 2014 at 9:23 AM, Nan Zhu <zhunanmcgill@gmail.com (mailto:zhunanmcgill@gmail.com)> wrote:
> > what you only need to do is ensure your spark cluster is running well, (you can check by access the Spark UI to see if all workers are displayed)
> > 
> > then, you have to set correct SPARK_MASTER_IP in the machine where you run spark-shell 
> > 
> > The more details are :
> > 
> > when you run bin/spark-shell, it will start the driver program in that machine, interacting with the Master to start the application (in this case, it is spark-shell) 
> > 
> > the Master tells Workers to start executors for your application, and the executors will try to register with your driver, 
> > 
> > then your driver can distribute tasks to the executors, i.e. run in a distributed fashion 
> > 
> > 
> > Best, 
> > 
> > -- 
> > Nan Zhu
> > 
> > 
> > On Wednesday, March 26, 2014 at 9:01 AM, Sai Prasanna wrote:
> > 
> > > Nan Zhu, its the later, I want to distribute the tasks to the cluster [machines available.]
> > > 
> > > If i set the SPARK_MASTER_IP at the other machines and set the slaves-IP in the /conf/slaves at the master node, will the interactive shell code run at the master get distributed across multiple machines ??? 
> > > 
> > > 
> > >  
> > > 
> > > 
> > > On Wed, Mar 26, 2014 at 6:32 PM, Nan Zhu <zhunanmcgill@gmail.com (mailto:zhunanmcgill@gmail.com)> wrote:
> > > > what do you mean by run across the cluster? 
> > > > 
> > > > you want to start the spark-shell across the cluster or you want to distribute tasks to multiple machines?
> > > > 
> > > > if the former case, yes, as long as you indicate the right master URL 
> > > > 
> > > > if the later case, also yes, you can observe the distributed task in the Spark UI 
> > > > 
> > > > -- 
> > > > Nan Zhu
> > > > 
> > > > 
> > > > On Wednesday, March 26, 2014 at 8:54 AM, Sai Prasanna wrote:
> > > > 
> > > > > Is it possible to run across cluster using Spark Interactive Shell ?
> > > > > 
> > > > > To be more explicit, is the procedure similar to running standalone master-slave spark. 
> > > > > 
> > > > > I want to execute my code in  the interactive shell in the master-node, and it should run across the cluster [say 5 node]. Is the procedure similar ???
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > -- 
> > > > > Sai Prasanna. AN
> > > > > II M.Tech (CS), SSSIHL
> > > > > 
> > > > > Entire water in the ocean can never sink a ship, Unless it gets inside.
> > > > > All the pressures of life can never hurt you, Unless you let them in.
> > > > 
> > > 
> > > 
> > > 
> > > -- 
> > > Sai Prasanna. AN
> > > II M.Tech (CS), SSSIHL
> > > 
> > > Entire water in the ocean can never sink a ship, Unless it gets inside.
> > > All the pressures of life can never hurt you, Unless you let them in.
> > 
>

Re: Distributed running in Spark Interactive shell

Posted by Yana Kadiyska <ya...@gmail.com>.

Nan (or anyone who feels they understand the cluster architecture well),
can you clarify something for me.

>From reading this user group and your explanation above it appears that the
cluster master is only involved in this during application startup -- to
allocate executors(from what you wrote sounds like the driver itself passes
the job/tasks to  the executors). From there onwards all computation is
done on the executors, who communicate results directly to the driver if
certain actions (say collect) are performed. Is that right? The only
description of the cluster I've seen came from here:
https://spark.apache.org/docs/0.9.0/cluster-overview.html but that picture
suggests there is no direct communication between driver and executors,
which I believe is wrong (unless I am misreading the picture -- I believe
Master and "Cluster Manager" refer to the same thing?).

The very short form of my question is, does the master do anything other
than executor allocation?


On Wed, Mar 26, 2014 at 9:23 AM, Nan Zhu <zh...@gmail.com> wrote:

>  what you only need to do is ensure your spark cluster is running well,
> (you can check by access the Spark UI to see if all workers are displayed)
>
> then, you have to set correct SPARK_MASTER_IP in the machine where you run
> spark-shell
>
> The more details are :
>
> when you run bin/spark-shell, it will start the driver program in that
> machine, interacting with the Master to start the application (in this
> case, it is spark-shell)
>
> the Master tells Workers to start executors for your application, and the
> executors will try to register with your driver,
>
> then your driver can distribute tasks to the executors, i.e. run in a
> distributed fashion
>
>
> Best,
>
> --
> Nan Zhu
>
> On Wednesday, March 26, 2014 at 9:01 AM, Sai Prasanna wrote:
>
> Nan Zhu, its the later, I want to distribute the tasks to the cluster
> [machines available.]
>
> If i set the SPARK_MASTER_IP at the other machines and set the slaves-IP
> in the /conf/slaves at the master node, will the interactive shell code run
> at the master get distributed across multiple machines ???
>
>
>
>
>
> On Wed, Mar 26, 2014 at 6:32 PM, Nan Zhu <zh...@gmail.com> wrote:
>
>  what do you mean by run across the cluster?
>
> you want to start the spark-shell across the cluster or you want to
> distribute tasks to multiple machines?
>
> if the former case, yes, as long as you indicate the right master URL
>
> if the later case, also yes, you can observe the distributed task in the
> Spark UI
>
> --
> Nan Zhu
>
> On Wednesday, March 26, 2014 at 8:54 AM, Sai Prasanna wrote:
>
> Is it possible to run across cluster using Spark Interactive Shell ?
>
> To be more explicit, is the procedure similar to running standalone
> master-slave spark.
>
> I want to execute my code in  the interactive shell in the master-node,
> and it should run across the cluster [say 5 node]. Is the procedure similar
> ???
>
>
>
>
>
> --
> *Sai Prasanna. AN*
> *II M.Tech (CS), SSSIHL*
>
>
> *Entire water in the ocean can never sink a ship, Unless it gets inside.
> All the pressures of life can never hurt you, Unless you let them in.*
>
>
>
>
>
> --
> *Sai Prasanna. AN*
> *II M.Tech (CS), SSSIHL*
>
>
> *Entire water in the ocean can never sink a ship, Unless it gets inside.
> All the pressures of life can never hurt you, Unless you let them in.*
>
>
>

Re: Distributed running in Spark Interactive shell

Posted by Nan Zhu <zh...@gmail.com>.

what you only need to do is ensure your spark cluster is running well, (you can check by access the Spark UI to see if all workers are displayed)

then, you have to set correct SPARK_MASTER_IP in the machine where you run spark-shell

The more details are :

when you run bin/spark-shell, it will start the driver program in that machine, interacting with the Master to start the application (in this case, it is spark-shell)

the Master tells Workers to start executors for your application, and the executors will try to register with your driver, 

then your driver can distribute tasks to the executors, i.e. run in a distributed fashion


Best, 

-- 
Nan Zhu


On Wednesday, March 26, 2014 at 9:01 AM, Sai Prasanna wrote:

> Nan Zhu, its the later, I want to distribute the tasks to the cluster [machines available.]
> 
> If i set the SPARK_MASTER_IP at the other machines and set the slaves-IP in the /conf/slaves at the master node, will the interactive shell code run at the master get distributed across multiple machines ??? 
> 
> 
>  
> 
> 
> On Wed, Mar 26, 2014 at 6:32 PM, Nan Zhu <zhunanmcgill@gmail.com (mailto:zhunanmcgill@gmail.com)> wrote:
> > what do you mean by run across the cluster? 
> > 
> > you want to start the spark-shell across the cluster or you want to distribute tasks to multiple machines?
> > 
> > if the former case, yes, as long as you indicate the right master URL 
> > 
> > if the later case, also yes, you can observe the distributed task in the Spark UI 
> > 
> > -- 
> > Nan Zhu
> > 
> > 
> > On Wednesday, March 26, 2014 at 8:54 AM, Sai Prasanna wrote:
> > 
> > > Is it possible to run across cluster using Spark Interactive Shell ?
> > > 
> > > To be more explicit, is the procedure similar to running standalone master-slave spark. 
> > > 
> > > I want to execute my code in  the interactive shell in the master-node, and it should run across the cluster [say 5 node]. Is the procedure similar ???
> > > 
> > > 
> > > 
> > > 
> > > 
> > > -- 
> > > Sai Prasanna. AN
> > > II M.Tech (CS), SSSIHL
> > > 
> > > Entire water in the ocean can never sink a ship, Unless it gets inside.
> > > All the pressures of life can never hurt you, Unless you let them in.
> > 
> 
> 
> 
> -- 
> Sai Prasanna. AN
> II M.Tech (CS), SSSIHL
> 
> Entire water in the ocean can never sink a ship, Unless it gets inside.
> All the pressures of life can never hurt you, Unless you let them in.

Re: Distributed running in Spark Interactive shell

Posted by Sai Prasanna <an...@gmail.com>.

Nan Zhu, its the later, I want to distribute the tasks to the cluster
[machines available.]

If i set the SPARK_MASTER_IP at the other machines and set the slaves-IP in
the /conf/slaves at the master node, will the interactive shell code run at
the master get distributed across multiple machines ???





On Wed, Mar 26, 2014 at 6:32 PM, Nan Zhu <zh...@gmail.com> wrote:

>  what do you mean by run across the cluster?
>
> you want to start the spark-shell across the cluster or you want to
> distribute tasks to multiple machines?
>
> if the former case, yes, as long as you indicate the right master URL
>
> if the later case, also yes, you can observe the distributed task in the
> Spark UI
>
> --
> Nan Zhu
>
> On Wednesday, March 26, 2014 at 8:54 AM, Sai Prasanna wrote:
>
> Is it possible to run across cluster using Spark Interactive Shell ?
>
> To be more explicit, is the procedure similar to running standalone
> master-slave spark.
>
> I want to execute my code in  the interactive shell in the master-node,
> and it should run across the cluster [say 5 node]. Is the procedure similar
> ???
>
>
>
>
>
> --
> *Sai Prasanna. AN*
> *II M.Tech (CS), SSSIHL*
>
>
> *Entire water in the ocean can never sink a ship, Unless it gets inside.
> All the pressures of life can never hurt you, Unless you let them in.*
>
>
>


-- 
*Sai Prasanna. AN*
*II M.Tech (CS), SSSIHL*


*Entire water in the ocean can never sink a ship, Unless it gets inside.All
the pressures of life can never hurt you, Unless you let them in.*

Re: Distributed running in Spark Interactive shell

Posted by Nan Zhu <zh...@gmail.com>.

what do you mean by run across the cluster? 

you want to start the spark-shell across the cluster or you want to distribute tasks to multiple machines?

if the former case, yes, as long as you indicate the right master URL

if the later case, also yes, you can observe the distributed task in the Spark UI 

-- 
Nan Zhu


On Wednesday, March 26, 2014 at 8:54 AM, Sai Prasanna wrote:

> Is it possible to run across cluster using Spark Interactive Shell ?
> 
> To be more explicit, is the procedure similar to running standalone master-slave spark.
> 
> I want to execute my code in  the interactive shell in the master-node, and it should run across the cluster [say 5 node]. Is the procedure similar ???
> 
> 
> 
> 
> 
> -- 
> Sai Prasanna. AN
> II M.Tech (CS), SSSIHL
> 
> Entire water in the ocean can never sink a ship, Unless it gets inside.
> All the pressures of life can never hurt you, Unless you let them in.