You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by David Thomas <dt...@gmail.com> on 2014/02/17 03:19:01 UTC

Connecting an Application to the Cluster

>From docs<https://spark.incubator.apache.org/docs/latest/spark-standalone.html>
:


*Connecting an Application to the ClusterTo run an application on the Spark
cluster, simply pass the spark://IP:PORT URL of the master as to the
SparkContext constructor.*

Could someone enlighten me on what happens if I run the app, from say,
Eclipse on my local machine, but use the url of the master node which is on
cloud. What role does my local JVM play then?

Re: Connecting an Application to the Cluster

Posted by Christopher Nguyen <ct...@adatao.com>.

David, actually, it's the driver that "creates" and holds a reference to
the SparkContext. The master in this context is only a resource manager
providing information about the cluster, being aware of where workers are,
how many there are, etc.

The SparkContext object can get serialized/deserialized and
instantiated/made available elsewhere (e.g., on the worker nodes), but this
is being overly precise and doesn't apply directly to the question you're
asking.

So yes, if you do collect(), you will be able to see the results on your
local console.
--
Christopher T. Nguyen
Co-founder & CEO, Adatao <http://adatao.com>
linkedin.com/in/ctnguyen



On Mon, Feb 17, 2014 at 8:54 AM, David Thomas <dt...@gmail.com> wrote:

> So if I do a spark action, say, collect, will I be able to see the result
> on my local console? Or would it be only available only on the cluster
> master?
>
>
> On Mon, Feb 17, 2014 at 9:50 AM, purav aggarwal <
> puravaggarwal123@gmail.com> wrote:
>
>> Your local machine simply submits your job (in the form of jar) to the
>> cluster.
>> The master node is where the SparkContext object is created, a DAG of
>> your job is formed and tasks (stages) are assigned to different workers -
>> which are not aware of anything but computation of task being assigned.
>>
>>
>> On Mon, Feb 17, 2014 at 10:07 PM, David Thomas <dt...@gmail.com>wrote:
>>
>>> Where is the SparkContext object created then? On my local machine or on
>>> the master node in the cluster?
>>>
>>>
>>> On Mon, Feb 17, 2014 at 4:17 AM, Nhan Vu Lam Chi <nh...@adatao.com>wrote:
>>>
>>>> Your local app will be called "driver program", which creates jobs and
>>>> submits them to the cluster for running.
>>>>
>>>>
>>>> On Mon, Feb 17, 2014 at 9:19 AM, David Thomas <dt...@gmail.com>wrote:
>>>>
>>>>> From docs<https://spark.incubator.apache.org/docs/latest/spark-standalone.html>
>>>>> :
>>>>>
>>>>>
>>>>> *Connecting an Application to the ClusterTo run an application on the
>>>>> Spark cluster, simply pass the spark://IP:PORT URL of the master as to the
>>>>> SparkContext constructor.*
>>>>>
>>>>> Could someone enlighten me on what happens if I run the app, from say,
>>>>> Eclipse on my local machine, but use the url of the master node which is on
>>>>> cloud. What role does my local JVM play then?
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Connecting an Application to the Cluster

Posted by purav aggarwal <pu...@gmail.com>.

The data would get aggregated on the master node.
Since the JVM for the application is invoked from your local machine (spark
driver) I think you might be able to print it on your console.


On Mon, Feb 17, 2014 at 10:24 PM, David Thomas <dt...@gmail.com> wrote:

> So if I do a spark action, say, collect, will I be able to see the result
> on my local console? Or would it be only available only on the cluster
> master?
>
>
> On Mon, Feb 17, 2014 at 9:50 AM, purav aggarwal <
> puravaggarwal123@gmail.com> wrote:
>
>> Your local machine simply submits your job (in the form of jar) to the
>> cluster.
>> The master node is where the SparkContext object is created, a DAG of
>> your job is formed and tasks (stages) are assigned to different workers -
>> which are not aware of anything but computation of task being assigned.
>>
>>
>> On Mon, Feb 17, 2014 at 10:07 PM, David Thomas <dt...@gmail.com>wrote:
>>
>>> Where is the SparkContext object created then? On my local machine or on
>>> the master node in the cluster?
>>>
>>>
>>> On Mon, Feb 17, 2014 at 4:17 AM, Nhan Vu Lam Chi <nh...@adatao.com>wrote:
>>>
>>>> Your local app will be called "driver program", which creates jobs and
>>>> submits them to the cluster for running.
>>>>
>>>>
>>>> On Mon, Feb 17, 2014 at 9:19 AM, David Thomas <dt...@gmail.com>wrote:
>>>>
>>>>> From docs<https://spark.incubator.apache.org/docs/latest/spark-standalone.html>
>>>>> :
>>>>>
>>>>>
>>>>> *Connecting an Application to the ClusterTo run an application on the
>>>>> Spark cluster, simply pass the spark://IP:PORT URL of the master as to the
>>>>> SparkContext constructor.*
>>>>>
>>>>> Could someone enlighten me on what happens if I run the app, from say,
>>>>> Eclipse on my local machine, but use the url of the master node which is on
>>>>> cloud. What role does my local JVM play then?
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Connecting an Application to the Cluster

Posted by "Michael (Bach) Bui" <fr...@adatao.com>.

It used to be that you have to read Spark code to figure this information out.
However, Spark team has recently published this info here: http://spark.incubator.apache.org/docs/latest/cluster-overview.html






On Feb 17, 2014, at 11:35 AM, purav aggarwal <pu...@gmail.com> wrote:

> Sorry for the incorrect information. Where can I pick up these architectural/design concepts for Spark?
> I seem to have misunderstood the responsibilities of the master and the driver.
> 
> 
> On Mon, Feb 17, 2014 at 10:51 PM, Michael (Bach) Bui <fr...@adatao.com> wrote:
> Spark has the concept of  Driver and Master
> 
> Driver is your the spark program that you run in your local machine. SparkContext resides in the driver together with the DAG scheduler.
> Master is responsible for managing cluster resources, e.g. giving the Driver the workers that it needed. The Master can be either Mesos master (for Mesos cluster), or Spark master (for Spark standalone cluster), or ResourceManager (for Hadoop cluster)
> Given the resources assigned by Master, Driver will user DAG to assign tasks to workers.
> 
> So yes, the result of spark's actions will be sent back to driver, which is your local console.
> 
> 
> On Feb 17, 2014, at 10:54 AM, David Thomas <dt...@gmail.com> wrote:
> 
>> So if I do a spark action, say, collect, will I be able to see the result on my local console? Or would it be only available only on the cluster master?
>> 
>> 
>> On Mon, Feb 17, 2014 at 9:50 AM, purav aggarwal <pu...@gmail.com> wrote:
>> Your local machine simply submits your job (in the form of jar) to the cluster.
>> The master node is where the SparkContext object is created, a DAG of your job is formed and tasks (stages) are assigned to different workers - which are not aware of anything but computation of task being assigned.
>> 
>> 
>> On Mon, Feb 17, 2014 at 10:07 PM, David Thomas <dt...@gmail.com> wrote:
>> Where is the SparkContext object created then? On my local machine or on the master node in the cluster?
>> 
>> 
>> On Mon, Feb 17, 2014 at 4:17 AM, Nhan Vu Lam Chi <nh...@adatao.com> wrote:
>> Your local app will be called "driver program", which creates jobs and submits them to the cluster for running.
>> 
>> 
>> On Mon, Feb 17, 2014 at 9:19 AM, David Thomas <dt...@gmail.com> wrote:
>> From docs:
>> Connecting an Application to the Cluster
>> 
>> To run an application on the Spark cluster, simply pass the spark://IP:PORT URL of the master as to the SparkContext constructor.
>> 
>> Could someone enlighten me on what happens if I run the app, from say, Eclipse on my local machine, but use the url of the master node which is on cloud. What role does my local JVM play then?
>> 
>> 
>> 
>> 
> 
>

Re: Connecting an Application to the Cluster

Posted by purav aggarwal <pu...@gmail.com>.

Sorry for the incorrect information. Where can I pick up these
architectural/design concepts for Spark?
I seem to have misunderstood the responsibilities of the master and the
driver.


On Mon, Feb 17, 2014 at 10:51 PM, Michael (Bach) Bui <fr...@adatao.com>wrote:

> Spark has the concept of  Driver and Master
>
> Driver is your the spark program that you run in your local machine.
> SparkContext resides in the driver together with the DAG scheduler.
> Master is responsible for managing cluster resources, e.g. giving the
> Driver the workers that it needed. The Master can be either Mesos master
> (for Mesos cluster), or Spark master (for Spark standalone cluster), or
> ResourceManager (for Hadoop cluster)
> Given the resources assigned by Master, Driver will user DAG to assign
> tasks to workers.
>
> So yes, the result of spark's actions will be sent back to driver, which
> is your local console.
>
>
> On Feb 17, 2014, at 10:54 AM, David Thomas <dt...@gmail.com> wrote:
>
> So if I do a spark action, say, collect, will I be able to see the result
> on my local console? Or would it be only available only on the cluster
> master?
>
>
> On Mon, Feb 17, 2014 at 9:50 AM, purav aggarwal <
> puravaggarwal123@gmail.com> wrote:
>
>> Your local machine simply submits your job (in the form of jar) to the
>> cluster.
>> The master node is where the SparkContext object is created, a DAG of
>> your job is formed and tasks (stages) are assigned to different workers -
>> which are not aware of anything but computation of task being assigned.
>>
>>
>> On Mon, Feb 17, 2014 at 10:07 PM, David Thomas <dt...@gmail.com>wrote:
>>
>>> Where is the SparkContext object created then? On my local machine or on
>>> the master node in the cluster?
>>>
>>>
>>> On Mon, Feb 17, 2014 at 4:17 AM, Nhan Vu Lam Chi <nh...@adatao.com>wrote:
>>>
>>>> Your local app will be called "driver program", which creates jobs and
>>>> submits them to the cluster for running.
>>>>
>>>>
>>>> On Mon, Feb 17, 2014 at 9:19 AM, David Thomas <dt...@gmail.com>wrote:
>>>>
>>>>> From docs<https://spark.incubator.apache.org/docs/latest/spark-standalone.html>
>>>>> :
>>>>>
>>>>>
>>>>> *Connecting an Application to the ClusterTo run an application on the
>>>>> Spark cluster, simply pass the spark://IP:PORT URL of the master as to the
>>>>> SparkContext constructor.*
>>>>>
>>>>> Could someone enlighten me on what happens if I run the app, from say,
>>>>> Eclipse on my local machine, but use the url of the master node which is on
>>>>> cloud. What role does my local JVM play then?
>>>>>
>>>>
>>>>
>>>
>>
>
>

Re: Connecting an Application to the Cluster

Posted by David Thomas <dt...@gmail.com>.

Thanks everyone, it all makes sense now.


On Mon, Feb 17, 2014 at 10:21 AM, Michael (Bach) Bui <fr...@adatao.com>wrote:

> Spark has the concept of  Driver and Master
>
> Driver is your the spark program that you run in your local machine.
> SparkContext resides in the driver together with the DAG scheduler.
> Master is responsible for managing cluster resources, e.g. giving the
> Driver the workers that it needed. The Master can be either Mesos master
> (for Mesos cluster), or Spark master (for Spark standalone cluster), or
> ResourceManager (for Hadoop cluster)
> Given the resources assigned by Master, Driver will user DAG to assign
> tasks to workers.
>
> So yes, the result of spark's actions will be sent back to driver, which
> is your local console.
>
>
> On Feb 17, 2014, at 10:54 AM, David Thomas <dt...@gmail.com> wrote:
>
> So if I do a spark action, say, collect, will I be able to see the result
> on my local console? Or would it be only available only on the cluster
> master?
>
>
> On Mon, Feb 17, 2014 at 9:50 AM, purav aggarwal <
> puravaggarwal123@gmail.com> wrote:
>
>> Your local machine simply submits your job (in the form of jar) to the
>> cluster.
>> The master node is where the SparkContext object is created, a DAG of
>> your job is formed and tasks (stages) are assigned to different workers -
>> which are not aware of anything but computation of task being assigned.
>>
>>
>> On Mon, Feb 17, 2014 at 10:07 PM, David Thomas <dt...@gmail.com>wrote:
>>
>>> Where is the SparkContext object created then? On my local machine or on
>>> the master node in the cluster?
>>>
>>>
>>> On Mon, Feb 17, 2014 at 4:17 AM, Nhan Vu Lam Chi <nh...@adatao.com>wrote:
>>>
>>>> Your local app will be called "driver program", which creates jobs and
>>>> submits them to the cluster for running.
>>>>
>>>>
>>>> On Mon, Feb 17, 2014 at 9:19 AM, David Thomas <dt...@gmail.com>wrote:
>>>>
>>>>> From docs<https://spark.incubator.apache.org/docs/latest/spark-standalone.html>
>>>>> :
>>>>>
>>>>>
>>>>> *Connecting an Application to the ClusterTo run an application on the
>>>>> Spark cluster, simply pass the spark://IP:PORT URL of the master as to the
>>>>> SparkContext constructor.*
>>>>>
>>>>> Could someone enlighten me on what happens if I run the app, from say,
>>>>> Eclipse on my local machine, but use the url of the master node which is on
>>>>> cloud. What role does my local JVM play then?
>>>>>
>>>>
>>>>
>>>
>>
>
>

Re: Connecting an Application to the Cluster

Posted by "Michael (Bach) Bui" <fr...@adatao.com>.

Spark has the concept of  Driver and Master

Driver is your the spark program that you run in your local machine. SparkContext resides in the driver together with the DAG scheduler.
Master is responsible for managing cluster resources, e.g. giving the Driver the workers that it needed. The Master can be either Mesos master (for Mesos cluster), or Spark master (for Spark standalone cluster), or ResourceManager (for Hadoop cluster)
Given the resources assigned by Master, Driver will user DAG to assign tasks to workers.

So yes, the result of spark's actions will be sent back to driver, which is your local console.

On Feb 17, 2014, at 10:54 AM, David Thomas <dt...@gmail.com> wrote:

> So if I do a spark action, say, collect, will I be able to see the result on my local console? Or would it be only available only on the cluster master?
> 
> 
> On Mon, Feb 17, 2014 at 9:50 AM, purav aggarwal <pu...@gmail.com> wrote:
> Your local machine simply submits your job (in the form of jar) to the cluster.
> The master node is where the SparkContext object is created, a DAG of your job is formed and tasks (stages) are assigned to different workers - which are not aware of anything but computation of task being assigned.
> 
> 
> On Mon, Feb 17, 2014 at 10:07 PM, David Thomas <dt...@gmail.com> wrote:
> Where is the SparkContext object created then? On my local machine or on the master node in the cluster?
> 
> 
> On Mon, Feb 17, 2014 at 4:17 AM, Nhan Vu Lam Chi <nh...@adatao.com> wrote:
> Your local app will be called "driver program", which creates jobs and submits them to the cluster for running.
> 
> 
> On Mon, Feb 17, 2014 at 9:19 AM, David Thomas <dt...@gmail.com> wrote:
> From docs:
> Connecting an Application to the Cluster
> 
> To run an application on the Spark cluster, simply pass the spark://IP:PORT URL of the master as to the SparkContext constructor.
> 
> Could someone enlighten me on what happens if I run the app, from say, Eclipse on my local machine, but use the url of the master node which is on cloud. What role does my local JVM play then?
> 
> 
> 
>

Re: Connecting an Application to the Cluster

Posted by David Thomas <dt...@gmail.com>.

So if I do a spark action, say, collect, will I be able to see the result
on my local console? Or would it be only available only on the cluster
master?


On Mon, Feb 17, 2014 at 9:50 AM, purav aggarwal
<pu...@gmail.com>wrote:

> Your local machine simply submits your job (in the form of jar) to the
> cluster.
> The master node is where the SparkContext object is created, a DAG of your
> job is formed and tasks (stages) are assigned to different workers - which
> are not aware of anything but computation of task being assigned.
>
>
> On Mon, Feb 17, 2014 at 10:07 PM, David Thomas <dt...@gmail.com>wrote:
>
>> Where is the SparkContext object created then? On my local machine or on
>> the master node in the cluster?
>>
>>
>> On Mon, Feb 17, 2014 at 4:17 AM, Nhan Vu Lam Chi <nh...@adatao.com>wrote:
>>
>>> Your local app will be called "driver program", which creates jobs and
>>> submits them to the cluster for running.
>>>
>>>
>>> On Mon, Feb 17, 2014 at 9:19 AM, David Thomas <dt...@gmail.com>wrote:
>>>
>>>> From docs<https://spark.incubator.apache.org/docs/latest/spark-standalone.html>
>>>> :
>>>>
>>>>
>>>> *Connecting an Application to the ClusterTo run an application on the
>>>> Spark cluster, simply pass the spark://IP:PORT URL of the master as to the
>>>> SparkContext constructor.*
>>>>
>>>> Could someone enlighten me on what happens if I run the app, from say,
>>>> Eclipse on my local machine, but use the url of the master node which is on
>>>> cloud. What role does my local JVM play then?
>>>>
>>>
>>>
>>
>

Re: Connecting an Application to the Cluster

Posted by purav aggarwal <pu...@gmail.com>.

Your local machine simply submits your job (in the form of jar) to the
cluster.
The master node is where the SparkContext object is created, a DAG of your
job is formed and tasks (stages) are assigned to different workers - which
are not aware of anything but computation of task being assigned.


On Mon, Feb 17, 2014 at 10:07 PM, David Thomas <dt...@gmail.com> wrote:

> Where is the SparkContext object created then? On my local machine or on
> the master node in the cluster?
>
>
> On Mon, Feb 17, 2014 at 4:17 AM, Nhan Vu Lam Chi <nh...@adatao.com>wrote:
>
>> Your local app will be called "driver program", which creates jobs and
>> submits them to the cluster for running.
>>
>>
>> On Mon, Feb 17, 2014 at 9:19 AM, David Thomas <dt...@gmail.com>wrote:
>>
>>> From docs<https://spark.incubator.apache.org/docs/latest/spark-standalone.html>
>>> :
>>>
>>>
>>> *Connecting an Application to the ClusterTo run an application on the
>>> Spark cluster, simply pass the spark://IP:PORT URL of the master as to the
>>> SparkContext constructor.*
>>>
>>> Could someone enlighten me on what happens if I run the app, from say,
>>> Eclipse on my local machine, but use the url of the master node which is on
>>> cloud. What role does my local JVM play then?
>>>
>>
>>
>

Re: Connecting an Application to the Cluster

Posted by David Thomas <dt...@gmail.com>.

Where is the SparkContext object created then? On my local machine or on
the master node in the cluster?


On Mon, Feb 17, 2014 at 4:17 AM, Nhan Vu Lam Chi <nh...@adatao.com>wrote:

> Your local app will be called "driver program", which creates jobs and
> submits them to the cluster for running.
>
>
> On Mon, Feb 17, 2014 at 9:19 AM, David Thomas <dt...@gmail.com> wrote:
>
>> From docs<https://spark.incubator.apache.org/docs/latest/spark-standalone.html>
>> :
>>
>>
>> *Connecting an Application to the ClusterTo run an application on the
>> Spark cluster, simply pass the spark://IP:PORT URL of the master as to the
>> SparkContext constructor.*
>>
>> Could someone enlighten me on what happens if I run the app, from say,
>> Eclipse on my local machine, but use the url of the master node which is on
>> cloud. What role does my local JVM play then?
>>
>
>

Re: Connecting an Application to the Cluster

Posted by Nhan Vu Lam Chi <nh...@adatao.com>.

Your local app will be called "driver program", which creates jobs and
submits them to the cluster for running.


On Mon, Feb 17, 2014 at 9:19 AM, David Thomas <dt...@gmail.com> wrote:

> From docs<https://spark.incubator.apache.org/docs/latest/spark-standalone.html>
> :
>
>
> *Connecting an Application to the ClusterTo run an application on the
> Spark cluster, simply pass the spark://IP:PORT URL of the master as to the
> SparkContext constructor.*
>
> Could someone enlighten me on what happens if I run the app, from say,
> Eclipse on my local machine, but use the url of the master node which is on
> cloud. What role does my local JVM play then?
>