You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Mathieu Longtin <ma...@closetwork.org> on 2016/05/19 17:45:48 UTC

Starting executor without a master

First a bit of context:
We use Spark on a platform where each user start workers as needed. This
has the advantage that all permission management is handled by the OS, so
the users can only read files they have permission to.

To do this, we have some utility that does the following:
- start a master
- start worker managers on a number of servers
- "submit" the Spark driver program
- the driver then talks to the master, tell it how many executors it needs
- the master tell the worker nodes to start executors and talk to the driver
- the executors are started

From here on, the master doesn't do much, neither do the process manager on
the worker nodes.

What I would like to do is simplify this to:
- Start the driver program
- Start executors on a number of servers, telling them where to find the
driver
- The executors connect directly to the driver

Is there a way I could do this without the master and worker managers?

Thanks!


-- 
Mathieu Longtin
1-514-803-8977

Re: Starting executor without a master

Posted by Marcelo Vanzin <va...@cloudera.com>.

On Thu, May 19, 2016 at 6:06 PM, Mathieu Longtin <ma...@closetwork.org> wrote:
> I'm looking to bypass the master entirely. I manage the workers outside of
> Spark. So I want to start the driver, the start workers that connect
> directly to the driver.

It should be possible to do that if you extend the interface I
mentioned. I didn't mean "master" the daemon process, I meant the
github branch of Spark.


-- 
Marcelo

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: Starting executor without a master

Posted by Mathieu Longtin <ma...@closetwork.org>.

I'm looking to bypass the master entirely. I manage the workers outside of
Spark. So I want to start the driver, the start workers that connect
directly to the driver.

Anyway, it looks like I will have to live with our current solution for a
while.

On Thu, May 19, 2016 at 8:32 PM Marcelo Vanzin <va...@cloudera.com> wrote:

> Hi Mathieu,
>
> There's nothing like that in Spark currently. For that, you'd need a
> new cluster manager implementation that knows how to start executors
> in those remote machines (e.g. by running ssh or something).
>
> In the current master there's an interface you can implement to try
> that if you really want to (ExternalClusterManager), but it's
> currently "private[spark]" and it probably wouldn't be a very simple
> task.
>
>
> On Thu, May 19, 2016 at 10:45 AM, Mathieu Longtin
> <ma...@closetwork.org> wrote:
> > First a bit of context:
> > We use Spark on a platform where each user start workers as needed. This
> has
> > the advantage that all permission management is handled by the OS, so the
> > users can only read files they have permission to.
> >
> > To do this, we have some utility that does the following:
> > - start a master
> > - start worker managers on a number of servers
> > - "submit" the Spark driver program
> > - the driver then talks to the master, tell it how many executors it
> needs
> > - the master tell the worker nodes to start executors and talk to the
> driver
> > - the executors are started
> >
> > From here on, the master doesn't do much, neither do the process manager
> on
> > the worker nodes.
> >
> > What I would like to do is simplify this to:
> > - Start the driver program
> > - Start executors on a number of servers, telling them where to find the
> > driver
> > - The executors connect directly to the driver
> >
> > Is there a way I could do this without the master and worker managers?
> >
> > Thanks!
> >
> >
> > --
> > Mathieu Longtin
> > 1-514-803-8977
>
>
>
> --
> Marcelo
>
-- 
Mathieu Longtin
1-514-803-8977

Re: Starting executor without a master

Posted by Marcelo Vanzin <va...@cloudera.com>.

Hi Mathieu,

There's nothing like that in Spark currently. For that, you'd need a
new cluster manager implementation that knows how to start executors
in those remote machines (e.g. by running ssh or something).

In the current master there's an interface you can implement to try
that if you really want to (ExternalClusterManager), but it's
currently "private[spark]" and it probably wouldn't be a very simple
task.


On Thu, May 19, 2016 at 10:45 AM, Mathieu Longtin
<ma...@closetwork.org> wrote:
> First a bit of context:
> We use Spark on a platform where each user start workers as needed. This has
> the advantage that all permission management is handled by the OS, so the
> users can only read files they have permission to.
>
> To do this, we have some utility that does the following:
> - start a master
> - start worker managers on a number of servers
> - "submit" the Spark driver program
> - the driver then talks to the master, tell it how many executors it needs
> - the master tell the worker nodes to start executors and talk to the driver
> - the executors are started
>
> From here on, the master doesn't do much, neither do the process manager on
> the worker nodes.
>
> What I would like to do is simplify this to:
> - Start the driver program
> - Start executors on a number of servers, telling them where to find the
> driver
> - The executors connect directly to the driver
>
> Is there a way I could do this without the master and worker managers?
>
> Thanks!
>
>
> --
> Mathieu Longtin
> 1-514-803-8977



-- 
Marcelo

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: Starting executor without a master

Posted by Mathieu Longtin <ma...@closetwork.org>.

This is roughly what happens:

   1. When a worker starts, it connects to the master and says "I'm here, I
   have 64G of memory and 8 cores available" (for example)
   2. When you run spark-submit with a --master option, it starts a driver.
   3. The driver talks to the master and says "I want 200 cores with 4G of
   memory each"
   4. The master looks at all the workers that signed up, selects some with
   the right resources, and tell them "start an executor with 4 cores and 16G
   and have it talk to the driver"
   5. The executor is started by the worker, connects to the driver. From
   here on, the driver tells the executor what to do.
   6. When the driver exits, the executors exits as well, and the worker
   tell the manager that they have more resource available.

The master never starts an executor or worker itself. The worker starts the
executor. The master does not run "start-slave.sh" ever. The worker doesn't
do actual work, it just starts executor upon request from the master.

Btw, spark.master in the environment is tab:
spark://hostname,where.I.run.the.master:13219.

If however, you run spark-submit with no --master options, this is what
happens:

   1. spark-submit starts the driver
   2. In the same JVM instance, spark-submit starts an executor
   3. The driver tells that executor what to do

No master process is involved, no worker process is involved.


On Fri, May 20, 2016 at 12:56 PM Mich Talebzadeh <mi...@gmail.com>
wrote:

> Whenever, you start the slave from the master, a spark worker process will
> start on the slave node like below. This is on slave node started from
> master and it knowswhere the master is
>
>
> hduser    6300  0.2  4.2 1488388 170280 ?      Sl   15:31   0:13
> /usr/java/jdk1.7.0_79/jre/bin/java -cp
> /usr/lib/spark-1.6.1-bin-hadoop2.6/conf/:/usr/lib/spark-1.6.1-bin-hadoop2.6/lib/spark-assembly-1.6.1-hadoop2.6.0.jar:/usr/lib/spark-1.6.1-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar:/usr/lib/spark-1.6.1-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar:/usr/lib/spark-1.6.1-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar
> -Xms1g -Xmx1g -XX:MaxPermSize=256m org.apache.spark.deploy.worker.Worker
> --webui-port 8081 *spark://50.140.197.217:7077
> <http://50.140.197.217:7077>*
>
> Now back to your notes you stated
>
> This is the spark-submit command, which runs on my local server:
> $SPARK_HOME/bin/spark-submit --master spark://$host:$port
> --executor-memory 4g python-script.py with args
>
> If I want 200 worker cores, I tell the cluster scheduler to run this
> command on 200 cores:
> $SPARK_HOME/sbin/start-slave.sh --cores=1 --memory=4g --master
> spark://$host:$port
>
> so the process is the same, whether you start the slave job from master or
> you start it locally on each worker/slave through
> local  $SPARK_HOME/sbin/start-slave.sh. In your Spark job:4040/environment
> what is the value for
> spark.master please. I would have thought taht spark-submit through driver
> allocates work to worker processes not other way round. However, it would
> be interesting how these slave workers get the allocated work
>
> HTH
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 20 May 2016 at 16:48, Mathieu Longtin <ma...@closetwork.org> wrote:
>
>> If you give spark-submit a master address in the form of *--master
>> spark://masterhost:port*, it will connect to that master and grab all
>> the workers the master will give it unless you specifically limit it. If
>> you run spark-submit without a --master (or with --master local), it will
>> run without ever talking to the master using local cores only.
>>
>> Btw, the only processes that looks at *conf/slaves* are start-slaves.sh
>> and stop-slaves.sh. The master doesn't look at it, it will give work to any
>> workers that connect to it.
>>
>> And yes, when I look at the executor tab, I see as many executors as I
>> asked for, not just one or two, and they all exhibit signs of doing
>> something (I/O and CPU).
>>
>>
>> On Fri, May 20, 2016 at 11:01 AM Mich Talebzadeh <
>> mich.talebzadeh@gmail.com> wrote:
>>
>>> Ok so you have all the nodes names in slaves?
>>>
>>> I am sure you can do sbin/start-slaves.sh and a slave/work will start
>>> on every node specified in slaves.
>>>
>>> Now in that case the driver will know about the existence of all these
>>> workers. For example this is interesting. I run a standalone cluster but
>>> decide to start two works on two nodes
>>>
>>> This is in my slaves file
>>>
>>> localhost
>>> rhes564
>>> rhes564
>>> rhes5
>>> rhes5
>>>
>>> And it goes ahead and starts 5 worker processes as seen below
>>>
>>> [image: image.png]
>>>
>>> Regardless these are all started on the relevant nodes and the master
>>> knows about it.
>>>
>>> However, when I start a spark-submit job there is only one driver and
>>> one executor. That is expected as we only submitted one spark-submit job.
>>>
>>>
>>>
>>>
>>>
>>>  [image: image.png]
>>> So it would be interesting to know what you see in the executor tab on
>>> port 40:40 spark GUI..
>>>
>>>
>>> HTH
>>>
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>>
>>> On 20 May 2016 at 14:00, Mathieu Longtin <ma...@closetwork.org> wrote:
>>>
>>>> Correct, what I do to start workers is the equivalent of
>>>> start-slaves.sh. It ends up running the same command on the worker servers
>>>> as start-slaves does.
>>>>
>>>> It definitively uses all workers, and workers starting later pick up
>>>> work as well. If you have a long running job, you can add workers
>>>> dynamically and they will pick up work as long as there are enough
>>>> partitions to go around.
>>>>
>>>> I set spark.locality.wait to 0 so that workers never wait to pick up
>>>> tasks.
>>>>
>>>>
>>>>
>>>> On Fri, May 20, 2016 at 2:57 AM Mich Talebzadeh <
>>>> mich.talebzadeh@gmail.com> wrote:
>>>>
>>>>> OK this is basically form my notes for Spark standalone. Worker
>>>>> process is the slave process
>>>>>
>>>>> [image: image.png]
>>>>>
>>>>>
>>>>>
>>>>> You start worker as you showed
>>>>>
>>>>> $SPARK_HOME/sbin/start-slaves.sh
>>>>> Now that picks up the worker host node names from
>>>>> $SPARK_HOME/conf/slaves files. So you still have to tell Spark where to run
>>>>> workers.
>>>>>
>>>>> However, if I am correct regardless of what you have specified in
>>>>> slaves, in this standalone mode there will not be any spark process spawned
>>>>> by the driver on the slaves. In all probability you will be running one
>>>>> spark-submit process on the driver node. You can see this through the
>>>>> output of
>>>>>
>>>>> jps|grep SparkSubmit
>>>>>
>>>>> and you will see the details by running jmonitor for that SparkSubmit
>>>>> job
>>>>>
>>>>> However, I still doubt whether Scheduling Across applications is
>>>>> feasible in standalone mode.
>>>>>
>>>>> The doc says
>>>>>
>>>>> *Standalone mode:* By default, applications submitted to the
>>>>> standalone mode cluster will run in FIFO (first-in-first-out) order, and
>>>>> each application will try to use *all available nodes*. You can limit
>>>>> the number of nodes an application uses by setting the spark.cores.max
>>>>> configuration property in it, or change the default for applications that
>>>>> don’t set this setting through spark.deploy.defaultCores. Finally, in
>>>>> addition to controlling cores, each application’s
>>>>> spark.executor.memory setting controls its memory use.
>>>>>
>>>>> It uses the word all available nodes but I am not convinced if it will
>>>>> use those nodes? Someone can possibly clarify this
>>>>>
>>>>> HTH
>>>>>
>>>>>
>>>>> Dr Mich Talebzadeh
>>>>>
>>>>>
>>>>>
>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>
>>>>>
>>>>>
>>>>> http://talebzadehmich.wordpress.com
>>>>>
>>>>>
>>>>>
>>>>> On 20 May 2016 at 02:03, Mathieu Longtin <ma...@closetwork.org>
>>>>> wrote:
>>>>>
>>>>>> Okay:
>>>>>> *host=my.local.server*
>>>>>> *port=someport*
>>>>>>
>>>>>> This is the spark-submit command, which runs on my local server:
>>>>>> *$SPARK_HOME/bin/spark-submit --master spark://$host:$port
>>>>>> --executor-memory 4g python-script.py with args*
>>>>>>
>>>>>> If I want 200 worker cores, I tell the cluster scheduler to run this
>>>>>> command on 200 cores:
>>>>>> *$SPARK_HOME/sbin/start-slave.sh --cores=1 --memory=4g
>>>>>> spark://$host:$port *
>>>>>>
>>>>>> That's it. When the task starts, it uses all available workers. If
>>>>>> for some reason, not enough cores are available immediately, it still
>>>>>> starts processing with whatever it gets and the load will be spread further
>>>>>> as workers come online.
>>>>>>
>>>>>>
>>>>>> On Thu, May 19, 2016 at 8:24 PM Mich Talebzadeh <
>>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>>
>>>>>>> In a normal operation we tell spark which node the worker processes
>>>>>>> can run by adding the nodenames to conf/slaves.
>>>>>>>
>>>>>>> Not very clear on this in your case all the jobs run locally with
>>>>>>> say 100 executor cores like below:
>>>>>>>
>>>>>>>
>>>>>>> ${SPARK_HOME}/bin/spark-submit \
>>>>>>>
>>>>>>>                 --master local[*] \
>>>>>>>
>>>>>>>                 --driver-memory xg \  --default would be 512M
>>>>>>>
>>>>>>>                 --num-executors=1 \   -- This is the constraint in
>>>>>>> stand-alone Spark cluster, whether specified or not
>>>>>>>
>>>>>>>                 --executor-memory=xG \ --
>>>>>>>
>>>>>>>                 --executor-cores=n \
>>>>>>>
>>>>>>> --master local[*] means all cores and --executor-cores in your case
>>>>>>> need not be specified? or you can cap it like above --executor-cores=n.
>>>>>>> If it is not specified then the Spark app will go and grab every
>>>>>>> core. Although in practice that does not happen it is just an upper
>>>>>>> ceiling. It is FIFO.
>>>>>>>
>>>>>>> What typical executor memory is specified in your case?
>>>>>>>
>>>>>>> Do you have a  sample snapshot of spark-submit job by any chance
>>>>>>> Mathieu?
>>>>>>>
>>>>>>> Cheers
>>>>>>>
>>>>>>>
>>>>>>> Dr Mich Talebzadeh
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 20 May 2016 at 00:27, Mathieu Longtin <ma...@closetwork.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Mostly, the resource management is not up to the Spark master.
>>>>>>>>
>>>>>>>> We routinely start 100 executor-cores for 5 minute job, and they
>>>>>>>> just quit when they are done. Then those processor cores can do something
>>>>>>>> else entirely, they are not reserved for Spark at all.
>>>>>>>>
>>>>>>>> On Thu, May 19, 2016 at 4:55 PM Mich Talebzadeh <
>>>>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Then in theory every user can fire multiple spark-submit jobs. do
>>>>>>>>> you cap it with settings in  $SPARK_HOME/conf/spark-defaults.conf
>>>>>>>>> , but I guess in reality every user submits one job only.
>>>>>>>>>
>>>>>>>>> This is an interesting model for two reasons:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>    - It uses parallel processing across all the nodes or most of
>>>>>>>>>    the nodes to minimise the processing time
>>>>>>>>>    - it requires less intervention
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 19 May 2016 at 21:33, Mathieu Longtin <ma...@closetwork.org>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Driver memory is default. Executor memory depends on job, the
>>>>>>>>>> caller decides how much memory to use. We don't specify --num-executors as
>>>>>>>>>> we want all cores assigned to the local master, since they were started by
>>>>>>>>>> the current user. No local executor.  --master=spark://localhost:someport.
>>>>>>>>>> 1 core per executor.
>>>>>>>>>>
>>>>>>>>>> On Thu, May 19, 2016 at 4:12 PM Mich Talebzadeh <
>>>>>>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Thanks Mathieu
>>>>>>>>>>>
>>>>>>>>>>> So it would be interesting to see what resources allocated in
>>>>>>>>>>> your case, especially the num-executors and executor-cores. I gather every
>>>>>>>>>>> node has enough memory and cores.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ${SPARK_HOME}/bin/spark-submit \
>>>>>>>>>>>
>>>>>>>>>>>                 --master local[2] \
>>>>>>>>>>>
>>>>>>>>>>>                 --driver-memory 4g \
>>>>>>>>>>>
>>>>>>>>>>>                 --num-executors=1 \
>>>>>>>>>>>
>>>>>>>>>>>                 --executor-memory=4G \
>>>>>>>>>>>
>>>>>>>>>>>                 --executor-cores=2 \
>>>>>>>>>>>
>>>>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 19 May 2016 at 21:02, Mathieu Longtin <mathieu@closetwork.org
>>>>>>>>>>> > wrote:
>>>>>>>>>>>
>>>>>>>>>>>> The driver (the process started by spark-submit) runs locally.
>>>>>>>>>>>> The executors run on any of thousands of servers. So far, I haven't tried
>>>>>>>>>>>> more than 500 executors.
>>>>>>>>>>>>
>>>>>>>>>>>> Right now, I run a master on the same server as the driver.
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, May 19, 2016 at 3:49 PM Mich Talebzadeh <
>>>>>>>>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> ok so you are using some form of NFS mounted file system
>>>>>>>>>>>>> shared among the nodes and basically you start the processes through
>>>>>>>>>>>>> spark-submit.
>>>>>>>>>>>>>
>>>>>>>>>>>>> In Stand-alone mode, a simple cluster manager included with
>>>>>>>>>>>>> Spark. It does the management of resources so it is not clear
>>>>>>>>>>>>> to me what you are referring as worker manager here?
>>>>>>>>>>>>>
>>>>>>>>>>>>> This is my take from your model.
>>>>>>>>>>>>>  The application will go and grab all the cores in the cluster.
>>>>>>>>>>>>> You only have one worker that lives within the driver JVM
>>>>>>>>>>>>> process.
>>>>>>>>>>>>> The Driver node runs on the same host that the cluster manager
>>>>>>>>>>>>> is running. The Driver requests the Cluster Manager for resources to run
>>>>>>>>>>>>> tasks. In this case there is only one executor for the Driver? The Executor
>>>>>>>>>>>>> runs tasks for the Driver.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> HTH
>>>>>>>>>>>>>
>>>>>>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 19 May 2016 at 20:37, Mathieu Longtin <
>>>>>>>>>>>>> mathieu@closetwork.org> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> No master and no node manager, just the processes that do
>>>>>>>>>>>>>> actual work.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> We use the "stand alone" version because we have a shared
>>>>>>>>>>>>>> file system and a way of allocating computing resources already (Univa Grid
>>>>>>>>>>>>>> Engine). If an executor were to die, we have other ways of restarting it,
>>>>>>>>>>>>>> we don't need the worker manager to deal with it.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Thu, May 19, 2016 at 3:16 PM Mich Talebzadeh <
>>>>>>>>>>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Mathieu
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> What does this approach provide that the norm lacks?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> So basically each node has its master in this model.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Are these supposed to be individual stand alone servers?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>>>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 19 May 2016 at 18:45, Mathieu Longtin <
>>>>>>>>>>>>>>> mathieu@closetwork.org> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> First a bit of context:
>>>>>>>>>>>>>>>> We use Spark on a platform where each user start workers as
>>>>>>>>>>>>>>>> needed. This has the advantage that all permission management is handled by
>>>>>>>>>>>>>>>> the OS, so the users can only read files they have permission to.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> To do this, we have some utility that does the following:
>>>>>>>>>>>>>>>> - start a master
>>>>>>>>>>>>>>>> - start worker managers on a number of servers
>>>>>>>>>>>>>>>> - "submit" the Spark driver program
>>>>>>>>>>>>>>>> - the driver then talks to the master, tell it how many
>>>>>>>>>>>>>>>> executors it needs
>>>>>>>>>>>>>>>> - the master tell the worker nodes to start executors and
>>>>>>>>>>>>>>>> talk to the driver
>>>>>>>>>>>>>>>> - the executors are started
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> From here on, the master doesn't do much, neither do the
>>>>>>>>>>>>>>>> process manager on the worker nodes.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> What I would like to do is simplify this to:
>>>>>>>>>>>>>>>> - Start the driver program
>>>>>>>>>>>>>>>> - Start executors on a number of servers, telling them
>>>>>>>>>>>>>>>> where to find the driver
>>>>>>>>>>>>>>>> - The executors connect directly to the driver
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Is there a way I could do this without the master and
>>>>>>>>>>>>>>>> worker managers?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> Mathieu Longtin
>>>>>>>>>>>>>>>> 1-514-803-8977
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Mathieu Longtin
>>>>>>>>>>>>>> 1-514-803-8977
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>> Mathieu Longtin
>>>>>>>>>>>> 1-514-803-8977
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>> Mathieu Longtin
>>>>>>>>>> 1-514-803-8977
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>> Mathieu Longtin
>>>>>>>> 1-514-803-8977
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>> Mathieu Longtin
>>>>>> 1-514-803-8977
>>>>>>
>>>>>
>>>>> --
>>>> Mathieu Longtin
>>>> 1-514-803-8977
>>>>
>>>
>>> --
>> Mathieu Longtin
>> 1-514-803-8977
>>
>
> --
Mathieu Longtin
1-514-803-8977

Re: Starting executor without a master

Posted by Mich Talebzadeh <mi...@gmail.com>.

Whenever, you start the slave from the master, a spark worker process will
start on the slave node like below. This is on slave node started from
master and it knowswhere the master is


hduser    6300  0.2  4.2 1488388 170280 ?      Sl   15:31   0:13
/usr/java/jdk1.7.0_79/jre/bin/java -cp
/usr/lib/spark-1.6.1-bin-hadoop2.6/conf/:/usr/lib/spark-1.6.1-bin-hadoop2.6/lib/spark-assembly-1.6.1-hadoop2.6.0.jar:/usr/lib/spark-1.6.1-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar:/usr/lib/spark-1.6.1-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar:/usr/lib/spark-1.6.1-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar
-Xms1g -Xmx1g -XX:MaxPermSize=256m org.apache.spark.deploy.worker.Worker
--webui-port 8081 *spark://50.140.197.217:7077 <http://50.140.197.217:7077>*

Now back to your notes you stated

This is the spark-submit command, which runs on my local server:
$SPARK_HOME/bin/spark-submit --master spark://$host:$port --executor-memory
4g python-script.py with args

If I want 200 worker cores, I tell the cluster scheduler to run this
command on 200 cores:
$SPARK_HOME/sbin/start-slave.sh --cores=1 --memory=4g --master
spark://$host:$port

so the process is the same, whether you start the slave job from master or
you start it locally on each worker/slave through
local  $SPARK_HOME/sbin/start-slave.sh. In your Spark job:4040/environment
what is the value for
spark.master please. I would have thought taht spark-submit through driver
allocates work to worker processes not other way round. However, it would
be interesting how these slave workers get the allocated work

HTH

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 20 May 2016 at 16:48, Mathieu Longtin <ma...@closetwork.org> wrote:

> If you give spark-submit a master address in the form of *--master
> spark://masterhost:port*, it will connect to that master and grab all the
> workers the master will give it unless you specifically limit it. If you
> run spark-submit without a --master (or with --master local), it will run
> without ever talking to the master using local cores only.
>
> Btw, the only processes that looks at *conf/slaves* are start-slaves.sh
> and stop-slaves.sh. The master doesn't look at it, it will give work to any
> workers that connect to it.
>
> And yes, when I look at the executor tab, I see as many executors as I
> asked for, not just one or two, and they all exhibit signs of doing
> something (I/O and CPU).
>
>
> On Fri, May 20, 2016 at 11:01 AM Mich Talebzadeh <
> mich.talebzadeh@gmail.com> wrote:
>
>> Ok so you have all the nodes names in slaves?
>>
>> I am sure you can do sbin/start-slaves.sh and a slave/work will start on
>> every node specified in slaves.
>>
>> Now in that case the driver will know about the existence of all these
>> workers. For example this is interesting. I run a standalone cluster but
>> decide to start two works on two nodes
>>
>> This is in my slaves file
>>
>> localhost
>> rhes564
>> rhes564
>> rhes5
>> rhes5
>>
>> And it goes ahead and starts 5 worker processes as seen below
>>
>> [image: image.png]
>>
>> Regardless these are all started on the relevant nodes and the master
>> knows about it.
>>
>> However, when I start a spark-submit job there is only one driver and one
>> executor. That is expected as we only submitted one spark-submit job.
>>
>>
>>
>>
>>
>>  [image: image.png]
>> So it would be interesting to know what you see in the executor tab on
>> port 40:40 spark GUI..
>>
>>
>> HTH
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 20 May 2016 at 14:00, Mathieu Longtin <ma...@closetwork.org> wrote:
>>
>>> Correct, what I do to start workers is the equivalent of
>>> start-slaves.sh. It ends up running the same command on the worker servers
>>> as start-slaves does.
>>>
>>> It definitively uses all workers, and workers starting later pick up
>>> work as well. If you have a long running job, you can add workers
>>> dynamically and they will pick up work as long as there are enough
>>> partitions to go around.
>>>
>>> I set spark.locality.wait to 0 so that workers never wait to pick up
>>> tasks.
>>>
>>>
>>>
>>> On Fri, May 20, 2016 at 2:57 AM Mich Talebzadeh <
>>> mich.talebzadeh@gmail.com> wrote:
>>>
>>>> OK this is basically form my notes for Spark standalone. Worker process
>>>> is the slave process
>>>>
>>>> [image: image.png]
>>>>
>>>>
>>>>
>>>> You start worker as you showed
>>>>
>>>> $SPARK_HOME/sbin/start-slaves.sh
>>>> Now that picks up the worker host node names from
>>>> $SPARK_HOME/conf/slaves files. So you still have to tell Spark where to run
>>>> workers.
>>>>
>>>> However, if I am correct regardless of what you have specified in
>>>> slaves, in this standalone mode there will not be any spark process spawned
>>>> by the driver on the slaves. In all probability you will be running one
>>>> spark-submit process on the driver node. You can see this through the
>>>> output of
>>>>
>>>> jps|grep SparkSubmit
>>>>
>>>> and you will see the details by running jmonitor for that SparkSubmit
>>>> job
>>>>
>>>> However, I still doubt whether Scheduling Across applications is
>>>> feasible in standalone mode.
>>>>
>>>> The doc says
>>>>
>>>> *Standalone mode:* By default, applications submitted to the
>>>> standalone mode cluster will run in FIFO (first-in-first-out) order, and
>>>> each application will try to use *all available nodes*. You can limit
>>>> the number of nodes an application uses by setting the spark.cores.max
>>>> configuration property in it, or change the default for applications that
>>>> don’t set this setting through spark.deploy.defaultCores. Finally, in
>>>> addition to controlling cores, each application’s spark.executor.memory
>>>> setting controls its memory use.
>>>>
>>>> It uses the word all available nodes but I am not convinced if it will
>>>> use those nodes? Someone can possibly clarify this
>>>>
>>>> HTH
>>>>
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>>
>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>>
>>>>
>>>> On 20 May 2016 at 02:03, Mathieu Longtin <ma...@closetwork.org>
>>>> wrote:
>>>>
>>>>> Okay:
>>>>> *host=my.local.server*
>>>>> *port=someport*
>>>>>
>>>>> This is the spark-submit command, which runs on my local server:
>>>>> *$SPARK_HOME/bin/spark-submit --master spark://$host:$port
>>>>> --executor-memory 4g python-script.py with args*
>>>>>
>>>>> If I want 200 worker cores, I tell the cluster scheduler to run this
>>>>> command on 200 cores:
>>>>> *$SPARK_HOME/sbin/start-slave.sh --cores=1 --memory=4g
>>>>> spark://$host:$port *
>>>>>
>>>>> That's it. When the task starts, it uses all available workers. If for
>>>>> some reason, not enough cores are available immediately, it still starts
>>>>> processing with whatever it gets and the load will be spread further as
>>>>> workers come online.
>>>>>
>>>>>
>>>>> On Thu, May 19, 2016 at 8:24 PM Mich Talebzadeh <
>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>
>>>>>> In a normal operation we tell spark which node the worker processes
>>>>>> can run by adding the nodenames to conf/slaves.
>>>>>>
>>>>>> Not very clear on this in your case all the jobs run locally with say
>>>>>> 100 executor cores like below:
>>>>>>
>>>>>>
>>>>>> ${SPARK_HOME}/bin/spark-submit \
>>>>>>
>>>>>>                 --master local[*] \
>>>>>>
>>>>>>                 --driver-memory xg \  --default would be 512M
>>>>>>
>>>>>>                 --num-executors=1 \   -- This is the constraint in
>>>>>> stand-alone Spark cluster, whether specified or not
>>>>>>
>>>>>>                 --executor-memory=xG \ --
>>>>>>
>>>>>>                 --executor-cores=n \
>>>>>>
>>>>>> --master local[*] means all cores and --executor-cores in your case
>>>>>> need not be specified? or you can cap it like above --executor-cores=n.
>>>>>> If it is not specified then the Spark app will go and grab every
>>>>>> core. Although in practice that does not happen it is just an upper
>>>>>> ceiling. It is FIFO.
>>>>>>
>>>>>> What typical executor memory is specified in your case?
>>>>>>
>>>>>> Do you have a  sample snapshot of spark-submit job by any chance
>>>>>> Mathieu?
>>>>>>
>>>>>> Cheers
>>>>>>
>>>>>>
>>>>>> Dr Mich Talebzadeh
>>>>>>
>>>>>>
>>>>>>
>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>
>>>>>>
>>>>>>
>>>>>> http://talebzadehmich.wordpress.com
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 20 May 2016 at 00:27, Mathieu Longtin <ma...@closetwork.org>
>>>>>> wrote:
>>>>>>
>>>>>>> Mostly, the resource management is not up to the Spark master.
>>>>>>>
>>>>>>> We routinely start 100 executor-cores for 5 minute job, and they
>>>>>>> just quit when they are done. Then those processor cores can do something
>>>>>>> else entirely, they are not reserved for Spark at all.
>>>>>>>
>>>>>>> On Thu, May 19, 2016 at 4:55 PM Mich Talebzadeh <
>>>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>>>
>>>>>>>> Then in theory every user can fire multiple spark-submit jobs. do
>>>>>>>> you cap it with settings in  $SPARK_HOME/conf/spark-defaults.conf
>>>>>>>> , but I guess in reality every user submits one job only.
>>>>>>>>
>>>>>>>> This is an interesting model for two reasons:
>>>>>>>>
>>>>>>>>
>>>>>>>>    - It uses parallel processing across all the nodes or most of
>>>>>>>>    the nodes to minimise the processing time
>>>>>>>>    - it requires less intervention
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 19 May 2016 at 21:33, Mathieu Longtin <ma...@closetwork.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Driver memory is default. Executor memory depends on job, the
>>>>>>>>> caller decides how much memory to use. We don't specify --num-executors as
>>>>>>>>> we want all cores assigned to the local master, since they were started by
>>>>>>>>> the current user. No local executor.  --master=spark://localhost:someport.
>>>>>>>>> 1 core per executor.
>>>>>>>>>
>>>>>>>>> On Thu, May 19, 2016 at 4:12 PM Mich Talebzadeh <
>>>>>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Thanks Mathieu
>>>>>>>>>>
>>>>>>>>>> So it would be interesting to see what resources allocated in
>>>>>>>>>> your case, especially the num-executors and executor-cores. I gather every
>>>>>>>>>> node has enough memory and cores.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ${SPARK_HOME}/bin/spark-submit \
>>>>>>>>>>
>>>>>>>>>>                 --master local[2] \
>>>>>>>>>>
>>>>>>>>>>                 --driver-memory 4g \
>>>>>>>>>>
>>>>>>>>>>                 --num-executors=1 \
>>>>>>>>>>
>>>>>>>>>>                 --executor-memory=4G \
>>>>>>>>>>
>>>>>>>>>>                 --executor-cores=2 \
>>>>>>>>>>
>>>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 19 May 2016 at 21:02, Mathieu Longtin <ma...@closetwork.org>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> The driver (the process started by spark-submit) runs locally.
>>>>>>>>>>> The executors run on any of thousands of servers. So far, I haven't tried
>>>>>>>>>>> more than 500 executors.
>>>>>>>>>>>
>>>>>>>>>>> Right now, I run a master on the same server as the driver.
>>>>>>>>>>>
>>>>>>>>>>> On Thu, May 19, 2016 at 3:49 PM Mich Talebzadeh <
>>>>>>>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> ok so you are using some form of NFS mounted file system shared
>>>>>>>>>>>> among the nodes and basically you start the processes through spark-submit.
>>>>>>>>>>>>
>>>>>>>>>>>> In Stand-alone mode, a simple cluster manager included with
>>>>>>>>>>>> Spark. It does the management of resources so it is not clear
>>>>>>>>>>>> to me what you are referring as worker manager here?
>>>>>>>>>>>>
>>>>>>>>>>>> This is my take from your model.
>>>>>>>>>>>>  The application will go and grab all the cores in the cluster.
>>>>>>>>>>>> You only have one worker that lives within the driver JVM
>>>>>>>>>>>> process.
>>>>>>>>>>>> The Driver node runs on the same host that the cluster manager
>>>>>>>>>>>> is running. The Driver requests the Cluster Manager for resources to run
>>>>>>>>>>>> tasks. In this case there is only one executor for the Driver? The Executor
>>>>>>>>>>>> runs tasks for the Driver.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> HTH
>>>>>>>>>>>>
>>>>>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 19 May 2016 at 20:37, Mathieu Longtin <
>>>>>>>>>>>> mathieu@closetwork.org> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> No master and no node manager, just the processes that do
>>>>>>>>>>>>> actual work.
>>>>>>>>>>>>>
>>>>>>>>>>>>> We use the "stand alone" version because we have a shared file
>>>>>>>>>>>>> system and a way of allocating computing resources already (Univa Grid
>>>>>>>>>>>>> Engine). If an executor were to die, we have other ways of restarting it,
>>>>>>>>>>>>> we don't need the worker manager to deal with it.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, May 19, 2016 at 3:16 PM Mich Talebzadeh <
>>>>>>>>>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Mathieu
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> What does this approach provide that the norm lacks?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So basically each node has its master in this model.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Are these supposed to be individual stand alone servers?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 19 May 2016 at 18:45, Mathieu Longtin <
>>>>>>>>>>>>>> mathieu@closetwork.org> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> First a bit of context:
>>>>>>>>>>>>>>> We use Spark on a platform where each user start workers as
>>>>>>>>>>>>>>> needed. This has the advantage that all permission management is handled by
>>>>>>>>>>>>>>> the OS, so the users can only read files they have permission to.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> To do this, we have some utility that does the following:
>>>>>>>>>>>>>>> - start a master
>>>>>>>>>>>>>>> - start worker managers on a number of servers
>>>>>>>>>>>>>>> - "submit" the Spark driver program
>>>>>>>>>>>>>>> - the driver then talks to the master, tell it how many
>>>>>>>>>>>>>>> executors it needs
>>>>>>>>>>>>>>> - the master tell the worker nodes to start executors and
>>>>>>>>>>>>>>> talk to the driver
>>>>>>>>>>>>>>> - the executors are started
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> From here on, the master doesn't do much, neither do the
>>>>>>>>>>>>>>> process manager on the worker nodes.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> What I would like to do is simplify this to:
>>>>>>>>>>>>>>> - Start the driver program
>>>>>>>>>>>>>>> - Start executors on a number of servers, telling them where
>>>>>>>>>>>>>>> to find the driver
>>>>>>>>>>>>>>> - The executors connect directly to the driver
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Is there a way I could do this without the master and worker
>>>>>>>>>>>>>>> managers?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Mathieu Longtin
>>>>>>>>>>>>>>> 1-514-803-8977
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>> Mathieu Longtin
>>>>>>>>>>>>> 1-514-803-8977
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>> Mathieu Longtin
>>>>>>>>>>> 1-514-803-8977
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>> Mathieu Longtin
>>>>>>>>> 1-514-803-8977
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>> Mathieu Longtin
>>>>>>> 1-514-803-8977
>>>>>>>
>>>>>>
>>>>>> --
>>>>> Mathieu Longtin
>>>>> 1-514-803-8977
>>>>>
>>>>
>>>> --
>>> Mathieu Longtin
>>> 1-514-803-8977
>>>
>>
>> --
> Mathieu Longtin
> 1-514-803-8977
>

Re: Starting executor without a master

Posted by Mathieu Longtin <ma...@closetwork.org>.

If you give spark-submit a master address in the form of *--master
spark://masterhost:port*, it will connect to that master and grab all the
workers the master will give it unless you specifically limit it. If you
run spark-submit without a --master (or with --master local), it will run
without ever talking to the master using local cores only.

Btw, the only processes that looks at *conf/slaves* are start-slaves.sh and
stop-slaves.sh. The master doesn't look at it, it will give work to any
workers that connect to it.

And yes, when I look at the executor tab, I see as many executors as I
asked for, not just one or two, and they all exhibit signs of doing
something (I/O and CPU).


On Fri, May 20, 2016 at 11:01 AM Mich Talebzadeh <mi...@gmail.com>
wrote:

> Ok so you have all the nodes names in slaves?
>
> I am sure you can do sbin/start-slaves.sh and a slave/work will start on
> every node specified in slaves.
>
> Now in that case the driver will know about the existence of all these
> workers. For example this is interesting. I run a standalone cluster but
> decide to start two works on two nodes
>
> This is in my slaves file
>
> localhost
> rhes564
> rhes564
> rhes5
> rhes5
>
> And it goes ahead and starts 5 worker processes as seen below
>
> [image: image.png]
>
> Regardless these are all started on the relevant nodes and the master
> knows about it.
>
> However, when I start a spark-submit job there is only one driver and one
> executor. That is expected as we only submitted one spark-submit job.
>
>
>
>
>
>  [image: image.png]
> So it would be interesting to know what you see in the executor tab on
> port 40:40 spark GUI..
>
>
> HTH
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 20 May 2016 at 14:00, Mathieu Longtin <ma...@closetwork.org> wrote:
>
>> Correct, what I do to start workers is the equivalent of start-slaves.sh.
>> It ends up running the same command on the worker servers as start-slaves
>> does.
>>
>> It definitively uses all workers, and workers starting later pick up work
>> as well. If you have a long running job, you can add workers dynamically
>> and they will pick up work as long as there are enough partitions to go
>> around.
>>
>> I set spark.locality.wait to 0 so that workers never wait to pick up
>> tasks.
>>
>>
>>
>> On Fri, May 20, 2016 at 2:57 AM Mich Talebzadeh <
>> mich.talebzadeh@gmail.com> wrote:
>>
>>> OK this is basically form my notes for Spark standalone. Worker process
>>> is the slave process
>>>
>>> [image: image.png]
>>>
>>>
>>>
>>> You start worker as you showed
>>>
>>> $SPARK_HOME/sbin/start-slaves.sh
>>> Now that picks up the worker host node names from
>>> $SPARK_HOME/conf/slaves files. So you still have to tell Spark where to run
>>> workers.
>>>
>>> However, if I am correct regardless of what you have specified in
>>> slaves, in this standalone mode there will not be any spark process spawned
>>> by the driver on the slaves. In all probability you will be running one
>>> spark-submit process on the driver node. You can see this through the
>>> output of
>>>
>>> jps|grep SparkSubmit
>>>
>>> and you will see the details by running jmonitor for that SparkSubmit job
>>>
>>> However, I still doubt whether Scheduling Across applications is
>>> feasible in standalone mode.
>>>
>>> The doc says
>>>
>>> *Standalone mode:* By default, applications submitted to the standalone
>>> mode cluster will run in FIFO (first-in-first-out) order, and each
>>> application will try to use *all available nodes*. You can limit the
>>> number of nodes an application uses by setting the spark.cores.max
>>> configuration property in it, or change the default for applications that
>>> don’t set this setting through spark.deploy.defaultCores. Finally, in
>>> addition to controlling cores, each application’s spark.executor.memory
>>> setting controls its memory use.
>>>
>>> It uses the word all available nodes but I am not convinced if it will
>>> use those nodes? Someone can possibly clarify this
>>>
>>> HTH
>>>
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>>
>>> On 20 May 2016 at 02:03, Mathieu Longtin <ma...@closetwork.org> wrote:
>>>
>>>> Okay:
>>>> *host=my.local.server*
>>>> *port=someport*
>>>>
>>>> This is the spark-submit command, which runs on my local server:
>>>> *$SPARK_HOME/bin/spark-submit --master spark://$host:$port
>>>> --executor-memory 4g python-script.py with args*
>>>>
>>>> If I want 200 worker cores, I tell the cluster scheduler to run this
>>>> command on 200 cores:
>>>> *$SPARK_HOME/sbin/start-slave.sh --cores=1 --memory=4g
>>>> spark://$host:$port *
>>>>
>>>> That's it. When the task starts, it uses all available workers. If for
>>>> some reason, not enough cores are available immediately, it still starts
>>>> processing with whatever it gets and the load will be spread further as
>>>> workers come online.
>>>>
>>>>
>>>> On Thu, May 19, 2016 at 8:24 PM Mich Talebzadeh <
>>>> mich.talebzadeh@gmail.com> wrote:
>>>>
>>>>> In a normal operation we tell spark which node the worker processes
>>>>> can run by adding the nodenames to conf/slaves.
>>>>>
>>>>> Not very clear on this in your case all the jobs run locally with say
>>>>> 100 executor cores like below:
>>>>>
>>>>>
>>>>> ${SPARK_HOME}/bin/spark-submit \
>>>>>
>>>>>                 --master local[*] \
>>>>>
>>>>>                 --driver-memory xg \  --default would be 512M
>>>>>
>>>>>                 --num-executors=1 \   -- This is the constraint in
>>>>> stand-alone Spark cluster, whether specified or not
>>>>>
>>>>>                 --executor-memory=xG \ --
>>>>>
>>>>>                 --executor-cores=n \
>>>>>
>>>>> --master local[*] means all cores and --executor-cores in your case
>>>>> need not be specified? or you can cap it like above --executor-cores=n.
>>>>> If it is not specified then the Spark app will go and grab every core.
>>>>> Although in practice that does not happen it is just an upper ceiling. It
>>>>> is FIFO.
>>>>>
>>>>> What typical executor memory is specified in your case?
>>>>>
>>>>> Do you have a  sample snapshot of spark-submit job by any chance
>>>>> Mathieu?
>>>>>
>>>>> Cheers
>>>>>
>>>>>
>>>>> Dr Mich Talebzadeh
>>>>>
>>>>>
>>>>>
>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>
>>>>>
>>>>>
>>>>> http://talebzadehmich.wordpress.com
>>>>>
>>>>>
>>>>>
>>>>> On 20 May 2016 at 00:27, Mathieu Longtin <ma...@closetwork.org>
>>>>> wrote:
>>>>>
>>>>>> Mostly, the resource management is not up to the Spark master.
>>>>>>
>>>>>> We routinely start 100 executor-cores for 5 minute job, and they just
>>>>>> quit when they are done. Then those processor cores can do something else
>>>>>> entirely, they are not reserved for Spark at all.
>>>>>>
>>>>>> On Thu, May 19, 2016 at 4:55 PM Mich Talebzadeh <
>>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>>
>>>>>>> Then in theory every user can fire multiple spark-submit jobs. do
>>>>>>> you cap it with settings in  $SPARK_HOME/conf/spark-defaults.conf , but
>>>>>>> I guess in reality every user submits one job only.
>>>>>>>
>>>>>>> This is an interesting model for two reasons:
>>>>>>>
>>>>>>>
>>>>>>>    - It uses parallel processing across all the nodes or most of
>>>>>>>    the nodes to minimise the processing time
>>>>>>>    - it requires less intervention
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Dr Mich Talebzadeh
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 19 May 2016 at 21:33, Mathieu Longtin <ma...@closetwork.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Driver memory is default. Executor memory depends on job, the
>>>>>>>> caller decides how much memory to use. We don't specify --num-executors as
>>>>>>>> we want all cores assigned to the local master, since they were started by
>>>>>>>> the current user. No local executor.  --master=spark://localhost:someport.
>>>>>>>> 1 core per executor.
>>>>>>>>
>>>>>>>> On Thu, May 19, 2016 at 4:12 PM Mich Talebzadeh <
>>>>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Thanks Mathieu
>>>>>>>>>
>>>>>>>>> So it would be interesting to see what resources allocated in your
>>>>>>>>> case, especially the num-executors and executor-cores. I gather every node
>>>>>>>>> has enough memory and cores.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ${SPARK_HOME}/bin/spark-submit \
>>>>>>>>>
>>>>>>>>>                 --master local[2] \
>>>>>>>>>
>>>>>>>>>                 --driver-memory 4g \
>>>>>>>>>
>>>>>>>>>                 --num-executors=1 \
>>>>>>>>>
>>>>>>>>>                 --executor-memory=4G \
>>>>>>>>>
>>>>>>>>>                 --executor-cores=2 \
>>>>>>>>>
>>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 19 May 2016 at 21:02, Mathieu Longtin <ma...@closetwork.org>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> The driver (the process started by spark-submit) runs locally.
>>>>>>>>>> The executors run on any of thousands of servers. So far, I haven't tried
>>>>>>>>>> more than 500 executors.
>>>>>>>>>>
>>>>>>>>>> Right now, I run a master on the same server as the driver.
>>>>>>>>>>
>>>>>>>>>> On Thu, May 19, 2016 at 3:49 PM Mich Talebzadeh <
>>>>>>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> ok so you are using some form of NFS mounted file system shared
>>>>>>>>>>> among the nodes and basically you start the processes through spark-submit.
>>>>>>>>>>>
>>>>>>>>>>> In Stand-alone mode, a simple cluster manager included with
>>>>>>>>>>> Spark. It does the management of resources so it is not clear
>>>>>>>>>>> to me what you are referring as worker manager here?
>>>>>>>>>>>
>>>>>>>>>>> This is my take from your model.
>>>>>>>>>>>  The application will go and grab all the cores in the cluster.
>>>>>>>>>>> You only have one worker that lives within the driver JVM
>>>>>>>>>>> process.
>>>>>>>>>>> The Driver node runs on the same host that the cluster manager
>>>>>>>>>>> is running. The Driver requests the Cluster Manager for resources to run
>>>>>>>>>>> tasks. In this case there is only one executor for the Driver? The Executor
>>>>>>>>>>> runs tasks for the Driver.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> HTH
>>>>>>>>>>>
>>>>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 19 May 2016 at 20:37, Mathieu Longtin <mathieu@closetwork.org
>>>>>>>>>>> > wrote:
>>>>>>>>>>>
>>>>>>>>>>>> No master and no node manager, just the processes that do
>>>>>>>>>>>> actual work.
>>>>>>>>>>>>
>>>>>>>>>>>> We use the "stand alone" version because we have a shared file
>>>>>>>>>>>> system and a way of allocating computing resources already (Univa Grid
>>>>>>>>>>>> Engine). If an executor were to die, we have other ways of restarting it,
>>>>>>>>>>>> we don't need the worker manager to deal with it.
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, May 19, 2016 at 3:16 PM Mich Talebzadeh <
>>>>>>>>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Mathieu
>>>>>>>>>>>>>
>>>>>>>>>>>>> What does this approach provide that the norm lacks?
>>>>>>>>>>>>>
>>>>>>>>>>>>> So basically each node has its master in this model.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Are these supposed to be individual stand alone servers?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 19 May 2016 at 18:45, Mathieu Longtin <
>>>>>>>>>>>>> mathieu@closetwork.org> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> First a bit of context:
>>>>>>>>>>>>>> We use Spark on a platform where each user start workers as
>>>>>>>>>>>>>> needed. This has the advantage that all permission management is handled by
>>>>>>>>>>>>>> the OS, so the users can only read files they have permission to.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> To do this, we have some utility that does the following:
>>>>>>>>>>>>>> - start a master
>>>>>>>>>>>>>> - start worker managers on a number of servers
>>>>>>>>>>>>>> - "submit" the Spark driver program
>>>>>>>>>>>>>> - the driver then talks to the master, tell it how many
>>>>>>>>>>>>>> executors it needs
>>>>>>>>>>>>>> - the master tell the worker nodes to start executors and
>>>>>>>>>>>>>> talk to the driver
>>>>>>>>>>>>>> - the executors are started
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> From here on, the master doesn't do much, neither do the
>>>>>>>>>>>>>> process manager on the worker nodes.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> What I would like to do is simplify this to:
>>>>>>>>>>>>>> - Start the driver program
>>>>>>>>>>>>>> - Start executors on a number of servers, telling them where
>>>>>>>>>>>>>> to find the driver
>>>>>>>>>>>>>> - The executors connect directly to the driver
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Is there a way I could do this without the master and worker
>>>>>>>>>>>>>> managers?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Mathieu Longtin
>>>>>>>>>>>>>> 1-514-803-8977
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>> Mathieu Longtin
>>>>>>>>>>>> 1-514-803-8977
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>> Mathieu Longtin
>>>>>>>>>> 1-514-803-8977
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>> Mathieu Longtin
>>>>>>>> 1-514-803-8977
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>> Mathieu Longtin
>>>>>> 1-514-803-8977
>>>>>>
>>>>>
>>>>> --
>>>> Mathieu Longtin
>>>> 1-514-803-8977
>>>>
>>>
>>> --
>> Mathieu Longtin
>> 1-514-803-8977
>>
>
> --
Mathieu Longtin
1-514-803-8977

Re: Starting executor without a master

Posted by Mich Talebzadeh <mi...@gmail.com>.

Ok so you have all the nodes names in slaves?

I am sure you can do sbin/start-slaves.sh and a slave/work will start on
every node specified in slaves.

Now in that case the driver will know about the existence of all these
workers. For example this is interesting. I run a standalone cluster but
decide to start two works on two nodes

This is in my slaves file

localhost
rhes564
rhes564
rhes5
rhes5

And it goes ahead and starts 5 worker processes as seen below

[image: Inline images 2]

Regardless these are all started on the relevant nodes and the master knows
about it.

However, when I start a spark-submit job there is only one driver and one
executor. That is expected as we only submitted one spark-submit job.





 [image: Inline images 3]
So it would be interesting to know what you see in the executor tab on port
40:40 spark GUI..


HTH


Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 20 May 2016 at 14:00, Mathieu Longtin <ma...@closetwork.org> wrote:

> Correct, what I do to start workers is the equivalent of start-slaves.sh.
> It ends up running the same command on the worker servers as start-slaves
> does.
>
> It definitively uses all workers, and workers starting later pick up work
> as well. If you have a long running job, you can add workers dynamically
> and they will pick up work as long as there are enough partitions to go
> around.
>
> I set spark.locality.wait to 0 so that workers never wait to pick up tasks.
>
>
>
> On Fri, May 20, 2016 at 2:57 AM Mich Talebzadeh <mi...@gmail.com>
> wrote:
>
>> OK this is basically form my notes for Spark standalone. Worker process
>> is the slave process
>>
>> [image: Inline images 2]
>>
>>
>>
>> You start worker as you showed
>>
>> $SPARK_HOME/sbin/start-slaves.sh
>> Now that picks up the worker host node names from $SPARK_HOME/conf/slaves
>> files. So you still have to tell Spark where to run workers.
>>
>> However, if I am correct regardless of what you have specified in slaves,
>> in this standalone mode there will not be any spark process spawned by the
>> driver on the slaves. In all probability you will be running one
>> spark-submit process on the driver node. You can see this through the
>> output of
>>
>> jps|grep SparkSubmit
>>
>> and you will see the details by running jmonitor for that SparkSubmit job
>>
>> However, I still doubt whether Scheduling Across applications is feasible
>> in standalone mode.
>>
>> The doc says
>>
>> *Standalone mode:* By default, applications submitted to the standalone
>> mode cluster will run in FIFO (first-in-first-out) order, and each
>> application will try to use *all available nodes*. You can limit the
>> number of nodes an application uses by setting the spark.cores.max
>> configuration property in it, or change the default for applications that
>> don’t set this setting through spark.deploy.defaultCores. Finally, in
>> addition to controlling cores, each application’s spark.executor.memory
>> setting controls its memory use.
>>
>> It uses the word all available nodes but I am not convinced if it will
>> use those nodes? Someone can possibly clarify this
>>
>> HTH
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 20 May 2016 at 02:03, Mathieu Longtin <ma...@closetwork.org> wrote:
>>
>>> Okay:
>>> *host=my.local.server*
>>> *port=someport*
>>>
>>> This is the spark-submit command, which runs on my local server:
>>> *$SPARK_HOME/bin/spark-submit --master spark://$host:$port
>>> --executor-memory 4g python-script.py with args*
>>>
>>> If I want 200 worker cores, I tell the cluster scheduler to run this
>>> command on 200 cores:
>>> *$SPARK_HOME/sbin/start-slave.sh --cores=1 --memory=4g
>>> spark://$host:$port *
>>>
>>> That's it. When the task starts, it uses all available workers. If for
>>> some reason, not enough cores are available immediately, it still starts
>>> processing with whatever it gets and the load will be spread further as
>>> workers come online.
>>>
>>>
>>> On Thu, May 19, 2016 at 8:24 PM Mich Talebzadeh <
>>> mich.talebzadeh@gmail.com> wrote:
>>>
>>>> In a normal operation we tell spark which node the worker processes can
>>>> run by adding the nodenames to conf/slaves.
>>>>
>>>> Not very clear on this in your case all the jobs run locally with say
>>>> 100 executor cores like below:
>>>>
>>>>
>>>> ${SPARK_HOME}/bin/spark-submit \
>>>>
>>>>                 --master local[*] \
>>>>
>>>>                 --driver-memory xg \  --default would be 512M
>>>>
>>>>                 --num-executors=1 \   -- This is the constraint in
>>>> stand-alone Spark cluster, whether specified or not
>>>>
>>>>                 --executor-memory=xG \ --
>>>>
>>>>                 --executor-cores=n \
>>>>
>>>> --master local[*] means all cores and --executor-cores in your case
>>>> need not be specified? or you can cap it like above --executor-cores=n.
>>>> If it is not specified then the Spark app will go and grab every core.
>>>> Although in practice that does not happen it is just an upper ceiling. It
>>>> is FIFO.
>>>>
>>>> What typical executor memory is specified in your case?
>>>>
>>>> Do you have a  sample snapshot of spark-submit job by any chance
>>>> Mathieu?
>>>>
>>>> Cheers
>>>>
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>>
>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>>
>>>>
>>>> On 20 May 2016 at 00:27, Mathieu Longtin <ma...@closetwork.org>
>>>> wrote:
>>>>
>>>>> Mostly, the resource management is not up to the Spark master.
>>>>>
>>>>> We routinely start 100 executor-cores for 5 minute job, and they just
>>>>> quit when they are done. Then those processor cores can do something else
>>>>> entirely, they are not reserved for Spark at all.
>>>>>
>>>>> On Thu, May 19, 2016 at 4:55 PM Mich Talebzadeh <
>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>
>>>>>> Then in theory every user can fire multiple spark-submit jobs. do you
>>>>>> cap it with settings in  $SPARK_HOME/conf/spark-defaults.conf , but
>>>>>> I guess in reality every user submits one job only.
>>>>>>
>>>>>> This is an interesting model for two reasons:
>>>>>>
>>>>>>
>>>>>>    - It uses parallel processing across all the nodes or most of the
>>>>>>    nodes to minimise the processing time
>>>>>>    - it requires less intervention
>>>>>>
>>>>>>
>>>>>>
>>>>>> Dr Mich Talebzadeh
>>>>>>
>>>>>>
>>>>>>
>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>
>>>>>>
>>>>>>
>>>>>> http://talebzadehmich.wordpress.com
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 19 May 2016 at 21:33, Mathieu Longtin <ma...@closetwork.org>
>>>>>> wrote:
>>>>>>
>>>>>>> Driver memory is default. Executor memory depends on job, the caller
>>>>>>> decides how much memory to use. We don't specify --num-executors as we want
>>>>>>> all cores assigned to the local master, since they were started by the
>>>>>>> current user. No local executor.  --master=spark://localhost:someport. 1
>>>>>>> core per executor.
>>>>>>>
>>>>>>> On Thu, May 19, 2016 at 4:12 PM Mich Talebzadeh <
>>>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>>>
>>>>>>>> Thanks Mathieu
>>>>>>>>
>>>>>>>> So it would be interesting to see what resources allocated in your
>>>>>>>> case, especially the num-executors and executor-cores. I gather every node
>>>>>>>> has enough memory and cores.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ${SPARK_HOME}/bin/spark-submit \
>>>>>>>>
>>>>>>>>                 --master local[2] \
>>>>>>>>
>>>>>>>>                 --driver-memory 4g \
>>>>>>>>
>>>>>>>>                 --num-executors=1 \
>>>>>>>>
>>>>>>>>                 --executor-memory=4G \
>>>>>>>>
>>>>>>>>                 --executor-cores=2 \
>>>>>>>>
>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 19 May 2016 at 21:02, Mathieu Longtin <ma...@closetwork.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> The driver (the process started by spark-submit) runs locally. The
>>>>>>>>> executors run on any of thousands of servers. So far, I haven't tried more
>>>>>>>>> than 500 executors.
>>>>>>>>>
>>>>>>>>> Right now, I run a master on the same server as the driver.
>>>>>>>>>
>>>>>>>>> On Thu, May 19, 2016 at 3:49 PM Mich Talebzadeh <
>>>>>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> ok so you are using some form of NFS mounted file system shared
>>>>>>>>>> among the nodes and basically you start the processes through spark-submit.
>>>>>>>>>>
>>>>>>>>>> In Stand-alone mode, a simple cluster manager included with
>>>>>>>>>> Spark. It does the management of resources so it is not clear to
>>>>>>>>>> me what you are referring as worker manager here?
>>>>>>>>>>
>>>>>>>>>> This is my take from your model.
>>>>>>>>>>  The application will go and grab all the cores in the cluster.
>>>>>>>>>> You only have one worker that lives within the driver JVM
>>>>>>>>>> process.
>>>>>>>>>> The Driver node runs on the same host that the cluster manager is
>>>>>>>>>> running. The Driver requests the Cluster Manager for resources to run
>>>>>>>>>> tasks. In this case there is only one executor for the Driver? The Executor
>>>>>>>>>> runs tasks for the Driver.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> HTH
>>>>>>>>>>
>>>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 19 May 2016 at 20:37, Mathieu Longtin <ma...@closetwork.org>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> No master and no node manager, just the processes that do actual
>>>>>>>>>>> work.
>>>>>>>>>>>
>>>>>>>>>>> We use the "stand alone" version because we have a shared file
>>>>>>>>>>> system and a way of allocating computing resources already (Univa Grid
>>>>>>>>>>> Engine). If an executor were to die, we have other ways of restarting it,
>>>>>>>>>>> we don't need the worker manager to deal with it.
>>>>>>>>>>>
>>>>>>>>>>> On Thu, May 19, 2016 at 3:16 PM Mich Talebzadeh <
>>>>>>>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Mathieu
>>>>>>>>>>>>
>>>>>>>>>>>> What does this approach provide that the norm lacks?
>>>>>>>>>>>>
>>>>>>>>>>>> So basically each node has its master in this model.
>>>>>>>>>>>>
>>>>>>>>>>>> Are these supposed to be individual stand alone servers?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 19 May 2016 at 18:45, Mathieu Longtin <
>>>>>>>>>>>> mathieu@closetwork.org> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> First a bit of context:
>>>>>>>>>>>>> We use Spark on a platform where each user start workers as
>>>>>>>>>>>>> needed. This has the advantage that all permission management is handled by
>>>>>>>>>>>>> the OS, so the users can only read files they have permission to.
>>>>>>>>>>>>>
>>>>>>>>>>>>> To do this, we have some utility that does the following:
>>>>>>>>>>>>> - start a master
>>>>>>>>>>>>> - start worker managers on a number of servers
>>>>>>>>>>>>> - "submit" the Spark driver program
>>>>>>>>>>>>> - the driver then talks to the master, tell it how many
>>>>>>>>>>>>> executors it needs
>>>>>>>>>>>>> - the master tell the worker nodes to start executors and talk
>>>>>>>>>>>>> to the driver
>>>>>>>>>>>>> - the executors are started
>>>>>>>>>>>>>
>>>>>>>>>>>>> From here on, the master doesn't do much, neither do the
>>>>>>>>>>>>> process manager on the worker nodes.
>>>>>>>>>>>>>
>>>>>>>>>>>>> What I would like to do is simplify this to:
>>>>>>>>>>>>> - Start the driver program
>>>>>>>>>>>>> - Start executors on a number of servers, telling them where
>>>>>>>>>>>>> to find the driver
>>>>>>>>>>>>> - The executors connect directly to the driver
>>>>>>>>>>>>>
>>>>>>>>>>>>> Is there a way I could do this without the master and worker
>>>>>>>>>>>>> managers?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Mathieu Longtin
>>>>>>>>>>>>> 1-514-803-8977
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>> Mathieu Longtin
>>>>>>>>>>> 1-514-803-8977
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>> Mathieu Longtin
>>>>>>>>> 1-514-803-8977
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>> Mathieu Longtin
>>>>>>> 1-514-803-8977
>>>>>>>
>>>>>>
>>>>>> --
>>>>> Mathieu Longtin
>>>>> 1-514-803-8977
>>>>>
>>>>
>>>> --
>>> Mathieu Longtin
>>> 1-514-803-8977
>>>
>>
>> --
> Mathieu Longtin
> 1-514-803-8977
>

Re: Starting executor without a master

Posted by Mathieu Longtin <ma...@closetwork.org>.

Correct, what I do to start workers is the equivalent of start-slaves.sh.
It ends up running the same command on the worker servers as start-slaves
does.

It definitively uses all workers, and workers starting later pick up work
as well. If you have a long running job, you can add workers dynamically
and they will pick up work as long as there are enough partitions to go
around.

I set spark.locality.wait to 0 so that workers never wait to pick up tasks.



On Fri, May 20, 2016 at 2:57 AM Mich Talebzadeh <mi...@gmail.com>
wrote:

> OK this is basically form my notes for Spark standalone. Worker process is
> the slave process
>
> [image: Inline images 2]
>
>
>
> You start worker as you showed
>
> $SPARK_HOME/sbin/start-slaves.sh
> Now that picks up the worker host node names from $SPARK_HOME/conf/slaves
> files. So you still have to tell Spark where to run workers.
>
> However, if I am correct regardless of what you have specified in slaves,
> in this standalone mode there will not be any spark process spawned by the
> driver on the slaves. In all probability you will be running one
> spark-submit process on the driver node. You can see this through the
> output of
>
> jps|grep SparkSubmit
>
> and you will see the details by running jmonitor for that SparkSubmit job
>
> However, I still doubt whether Scheduling Across applications is feasible
> in standalone mode.
>
> The doc says
>
> *Standalone mode:* By default, applications submitted to the standalone
> mode cluster will run in FIFO (first-in-first-out) order, and each
> application will try to use *all available nodes*. You can limit the
> number of nodes an application uses by setting the spark.cores.max
> configuration property in it, or change the default for applications that
> don’t set this setting through spark.deploy.defaultCores. Finally, in
> addition to controlling cores, each application’s spark.executor.memory
> setting controls its memory use.
>
> It uses the word all available nodes but I am not convinced if it will use
> those nodes? Someone can possibly clarify this
>
> HTH
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 20 May 2016 at 02:03, Mathieu Longtin <ma...@closetwork.org> wrote:
>
>> Okay:
>> *host=my.local.server*
>> *port=someport*
>>
>> This is the spark-submit command, which runs on my local server:
>> *$SPARK_HOME/bin/spark-submit --master spark://$host:$port
>> --executor-memory 4g python-script.py with args*
>>
>> If I want 200 worker cores, I tell the cluster scheduler to run this
>> command on 200 cores:
>> *$SPARK_HOME/sbin/start-slave.sh --cores=1 --memory=4g
>> spark://$host:$port *
>>
>> That's it. When the task starts, it uses all available workers. If for
>> some reason, not enough cores are available immediately, it still starts
>> processing with whatever it gets and the load will be spread further as
>> workers come online.
>>
>>
>> On Thu, May 19, 2016 at 8:24 PM Mich Talebzadeh <
>> mich.talebzadeh@gmail.com> wrote:
>>
>>> In a normal operation we tell spark which node the worker processes can
>>> run by adding the nodenames to conf/slaves.
>>>
>>> Not very clear on this in your case all the jobs run locally with say
>>> 100 executor cores like below:
>>>
>>>
>>> ${SPARK_HOME}/bin/spark-submit \
>>>
>>>                 --master local[*] \
>>>
>>>                 --driver-memory xg \  --default would be 512M
>>>
>>>                 --num-executors=1 \   -- This is the constraint in
>>> stand-alone Spark cluster, whether specified or not
>>>
>>>                 --executor-memory=xG \ --
>>>
>>>                 --executor-cores=n \
>>>
>>> --master local[*] means all cores and --executor-cores in your case need
>>> not be specified? or you can cap it like above --executor-cores=n. If
>>> it is not specified then the Spark app will go and grab every core.
>>> Although in practice that does not happen it is just an upper ceiling. It
>>> is FIFO.
>>>
>>> What typical executor memory is specified in your case?
>>>
>>> Do you have a  sample snapshot of spark-submit job by any chance Mathieu?
>>>
>>> Cheers
>>>
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>>
>>> On 20 May 2016 at 00:27, Mathieu Longtin <ma...@closetwork.org> wrote:
>>>
>>>> Mostly, the resource management is not up to the Spark master.
>>>>
>>>> We routinely start 100 executor-cores for 5 minute job, and they just
>>>> quit when they are done. Then those processor cores can do something else
>>>> entirely, they are not reserved for Spark at all.
>>>>
>>>> On Thu, May 19, 2016 at 4:55 PM Mich Talebzadeh <
>>>> mich.talebzadeh@gmail.com> wrote:
>>>>
>>>>> Then in theory every user can fire multiple spark-submit jobs. do you
>>>>> cap it with settings in  $SPARK_HOME/conf/spark-defaults.conf , but I
>>>>> guess in reality every user submits one job only.
>>>>>
>>>>> This is an interesting model for two reasons:
>>>>>
>>>>>
>>>>>    - It uses parallel processing across all the nodes or most of the
>>>>>    nodes to minimise the processing time
>>>>>    - it requires less intervention
>>>>>
>>>>>
>>>>>
>>>>> Dr Mich Talebzadeh
>>>>>
>>>>>
>>>>>
>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>
>>>>>
>>>>>
>>>>> http://talebzadehmich.wordpress.com
>>>>>
>>>>>
>>>>>
>>>>> On 19 May 2016 at 21:33, Mathieu Longtin <ma...@closetwork.org>
>>>>> wrote:
>>>>>
>>>>>> Driver memory is default. Executor memory depends on job, the caller
>>>>>> decides how much memory to use. We don't specify --num-executors as we want
>>>>>> all cores assigned to the local master, since they were started by the
>>>>>> current user. No local executor.  --master=spark://localhost:someport. 1
>>>>>> core per executor.
>>>>>>
>>>>>> On Thu, May 19, 2016 at 4:12 PM Mich Talebzadeh <
>>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>>
>>>>>>> Thanks Mathieu
>>>>>>>
>>>>>>> So it would be interesting to see what resources allocated in your
>>>>>>> case, especially the num-executors and executor-cores. I gather every node
>>>>>>> has enough memory and cores.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ${SPARK_HOME}/bin/spark-submit \
>>>>>>>
>>>>>>>                 --master local[2] \
>>>>>>>
>>>>>>>                 --driver-memory 4g \
>>>>>>>
>>>>>>>                 --num-executors=1 \
>>>>>>>
>>>>>>>                 --executor-memory=4G \
>>>>>>>
>>>>>>>                 --executor-cores=2 \
>>>>>>>
>>>>>>> Dr Mich Talebzadeh
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 19 May 2016 at 21:02, Mathieu Longtin <ma...@closetwork.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> The driver (the process started by spark-submit) runs locally. The
>>>>>>>> executors run on any of thousands of servers. So far, I haven't tried more
>>>>>>>> than 500 executors.
>>>>>>>>
>>>>>>>> Right now, I run a master on the same server as the driver.
>>>>>>>>
>>>>>>>> On Thu, May 19, 2016 at 3:49 PM Mich Talebzadeh <
>>>>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> ok so you are using some form of NFS mounted file system shared
>>>>>>>>> among the nodes and basically you start the processes through spark-submit.
>>>>>>>>>
>>>>>>>>> In Stand-alone mode, a simple cluster manager included with
>>>>>>>>> Spark. It does the management of resources so it is not clear to
>>>>>>>>> me what you are referring as worker manager here?
>>>>>>>>>
>>>>>>>>> This is my take from your model.
>>>>>>>>>  The application will go and grab all the cores in the cluster.
>>>>>>>>> You only have one worker that lives within the driver JVM process.
>>>>>>>>> The Driver node runs on the same host that the cluster manager is
>>>>>>>>> running. The Driver requests the Cluster Manager for resources to run
>>>>>>>>> tasks. In this case there is only one executor for the Driver? The Executor
>>>>>>>>> runs tasks for the Driver.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> HTH
>>>>>>>>>
>>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 19 May 2016 at 20:37, Mathieu Longtin <ma...@closetwork.org>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> No master and no node manager, just the processes that do actual
>>>>>>>>>> work.
>>>>>>>>>>
>>>>>>>>>> We use the "stand alone" version because we have a shared file
>>>>>>>>>> system and a way of allocating computing resources already (Univa Grid
>>>>>>>>>> Engine). If an executor were to die, we have other ways of restarting it,
>>>>>>>>>> we don't need the worker manager to deal with it.
>>>>>>>>>>
>>>>>>>>>> On Thu, May 19, 2016 at 3:16 PM Mich Talebzadeh <
>>>>>>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Mathieu
>>>>>>>>>>>
>>>>>>>>>>> What does this approach provide that the norm lacks?
>>>>>>>>>>>
>>>>>>>>>>> So basically each node has its master in this model.
>>>>>>>>>>>
>>>>>>>>>>> Are these supposed to be individual stand alone servers?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 19 May 2016 at 18:45, Mathieu Longtin <mathieu@closetwork.org
>>>>>>>>>>> > wrote:
>>>>>>>>>>>
>>>>>>>>>>>> First a bit of context:
>>>>>>>>>>>> We use Spark on a platform where each user start workers as
>>>>>>>>>>>> needed. This has the advantage that all permission management is handled by
>>>>>>>>>>>> the OS, so the users can only read files they have permission to.
>>>>>>>>>>>>
>>>>>>>>>>>> To do this, we have some utility that does the following:
>>>>>>>>>>>> - start a master
>>>>>>>>>>>> - start worker managers on a number of servers
>>>>>>>>>>>> - "submit" the Spark driver program
>>>>>>>>>>>> - the driver then talks to the master, tell it how many
>>>>>>>>>>>> executors it needs
>>>>>>>>>>>> - the master tell the worker nodes to start executors and talk
>>>>>>>>>>>> to the driver
>>>>>>>>>>>> - the executors are started
>>>>>>>>>>>>
>>>>>>>>>>>> From here on, the master doesn't do much, neither do the
>>>>>>>>>>>> process manager on the worker nodes.
>>>>>>>>>>>>
>>>>>>>>>>>> What I would like to do is simplify this to:
>>>>>>>>>>>> - Start the driver program
>>>>>>>>>>>> - Start executors on a number of servers, telling them where to
>>>>>>>>>>>> find the driver
>>>>>>>>>>>> - The executors connect directly to the driver
>>>>>>>>>>>>
>>>>>>>>>>>> Is there a way I could do this without the master and worker
>>>>>>>>>>>> managers?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Mathieu Longtin
>>>>>>>>>>>> 1-514-803-8977
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>> Mathieu Longtin
>>>>>>>>>> 1-514-803-8977
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>> Mathieu Longtin
>>>>>>>> 1-514-803-8977
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>> Mathieu Longtin
>>>>>> 1-514-803-8977
>>>>>>
>>>>>
>>>>> --
>>>> Mathieu Longtin
>>>> 1-514-803-8977
>>>>
>>>
>>> --
>> Mathieu Longtin
>> 1-514-803-8977
>>
>
> --
Mathieu Longtin
1-514-803-8977

Re: Starting executor without a master

Posted by Mich Talebzadeh <mi...@gmail.com>.

OK this is basically form my notes for Spark standalone. Worker process is
the slave process

[image: Inline images 2]



You start worker as you showed

$SPARK_HOME/sbin/start-slaves.sh
Now that picks up the worker host node names from $SPARK_HOME/conf/slaves
files. So you still have to tell Spark where to run workers.

However, if I am correct regardless of what you have specified in slaves,
in this standalone mode there will not be any spark process spawned by the
driver on the slaves. In all probability you will be running one
spark-submit process on the driver node. You can see this through the
output of

jps|grep SparkSubmit

and you will see the details by running jmonitor for that SparkSubmit job

However, I still doubt whether Scheduling Across applications is feasible
in standalone mode.

The doc says

*Standalone mode:* By default, applications submitted to the standalone
mode cluster will run in FIFO (first-in-first-out) order, and each
application will try to use *all available nodes*. You can limit the number
of nodes an application uses by setting the spark.cores.max configuration
property in it, or change the default for applications that don’t set this
setting through spark.deploy.defaultCores. Finally, in addition to
controlling cores, each application’s spark.executor.memory setting
controls its memory use.

It uses the word all available nodes but I am not convinced if it will use
those nodes? Someone can possibly clarify this

HTH


Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 20 May 2016 at 02:03, Mathieu Longtin <ma...@closetwork.org> wrote:

> Okay:
> *host=my.local.server*
> *port=someport*
>
> This is the spark-submit command, which runs on my local server:
> *$SPARK_HOME/bin/spark-submit --master spark://$host:$port
> --executor-memory 4g python-script.py with args*
>
> If I want 200 worker cores, I tell the cluster scheduler to run this
> command on 200 cores:
> *$SPARK_HOME/sbin/start-slave.sh --cores=1 --memory=4g
> spark://$host:$port *
>
> That's it. When the task starts, it uses all available workers. If for
> some reason, not enough cores are available immediately, it still starts
> processing with whatever it gets and the load will be spread further as
> workers come online.
>
>
> On Thu, May 19, 2016 at 8:24 PM Mich Talebzadeh <mi...@gmail.com>
> wrote:
>
>> In a normal operation we tell spark which node the worker processes can
>> run by adding the nodenames to conf/slaves.
>>
>> Not very clear on this in your case all the jobs run locally with say 100
>> executor cores like below:
>>
>>
>> ${SPARK_HOME}/bin/spark-submit \
>>
>>                 --master local[*] \
>>
>>                 --driver-memory xg \  --default would be 512M
>>
>>                 --num-executors=1 \   -- This is the constraint in
>> stand-alone Spark cluster, whether specified or not
>>
>>                 --executor-memory=xG \ --
>>
>>                 --executor-cores=n \
>>
>> --master local[*] means all cores and --executor-cores in your case need
>> not be specified? or you can cap it like above --executor-cores=n. If it
>> is not specified then the Spark app will go and grab every core. Although
>> in practice that does not happen it is just an upper ceiling. It is FIFO.
>>
>> What typical executor memory is specified in your case?
>>
>> Do you have a  sample snapshot of spark-submit job by any chance Mathieu?
>>
>> Cheers
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 20 May 2016 at 00:27, Mathieu Longtin <ma...@closetwork.org> wrote:
>>
>>> Mostly, the resource management is not up to the Spark master.
>>>
>>> We routinely start 100 executor-cores for 5 minute job, and they just
>>> quit when they are done. Then those processor cores can do something else
>>> entirely, they are not reserved for Spark at all.
>>>
>>> On Thu, May 19, 2016 at 4:55 PM Mich Talebzadeh <
>>> mich.talebzadeh@gmail.com> wrote:
>>>
>>>> Then in theory every user can fire multiple spark-submit jobs. do you
>>>> cap it with settings in  $SPARK_HOME/conf/spark-defaults.conf , but I
>>>> guess in reality every user submits one job only.
>>>>
>>>> This is an interesting model for two reasons:
>>>>
>>>>
>>>>    - It uses parallel processing across all the nodes or most of the
>>>>    nodes to minimise the processing time
>>>>    - it requires less intervention
>>>>
>>>>
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>>
>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>>
>>>>
>>>> On 19 May 2016 at 21:33, Mathieu Longtin <ma...@closetwork.org>
>>>> wrote:
>>>>
>>>>> Driver memory is default. Executor memory depends on job, the caller
>>>>> decides how much memory to use. We don't specify --num-executors as we want
>>>>> all cores assigned to the local master, since they were started by the
>>>>> current user. No local executor.  --master=spark://localhost:someport. 1
>>>>> core per executor.
>>>>>
>>>>> On Thu, May 19, 2016 at 4:12 PM Mich Talebzadeh <
>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>
>>>>>> Thanks Mathieu
>>>>>>
>>>>>> So it would be interesting to see what resources allocated in your
>>>>>> case, especially the num-executors and executor-cores. I gather every node
>>>>>> has enough memory and cores.
>>>>>>
>>>>>>
>>>>>>
>>>>>> ${SPARK_HOME}/bin/spark-submit \
>>>>>>
>>>>>>                 --master local[2] \
>>>>>>
>>>>>>                 --driver-memory 4g \
>>>>>>
>>>>>>                 --num-executors=1 \
>>>>>>
>>>>>>                 --executor-memory=4G \
>>>>>>
>>>>>>                 --executor-cores=2 \
>>>>>>
>>>>>> Dr Mich Talebzadeh
>>>>>>
>>>>>>
>>>>>>
>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>
>>>>>>
>>>>>>
>>>>>> http://talebzadehmich.wordpress.com
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 19 May 2016 at 21:02, Mathieu Longtin <ma...@closetwork.org>
>>>>>> wrote:
>>>>>>
>>>>>>> The driver (the process started by spark-submit) runs locally. The
>>>>>>> executors run on any of thousands of servers. So far, I haven't tried more
>>>>>>> than 500 executors.
>>>>>>>
>>>>>>> Right now, I run a master on the same server as the driver.
>>>>>>>
>>>>>>> On Thu, May 19, 2016 at 3:49 PM Mich Talebzadeh <
>>>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>>>
>>>>>>>> ok so you are using some form of NFS mounted file system shared
>>>>>>>> among the nodes and basically you start the processes through spark-submit.
>>>>>>>>
>>>>>>>> In Stand-alone mode, a simple cluster manager included with
>>>>>>>> Spark. It does the management of resources so it is not clear to
>>>>>>>> me what you are referring as worker manager here?
>>>>>>>>
>>>>>>>> This is my take from your model.
>>>>>>>>  The application will go and grab all the cores in the cluster.
>>>>>>>> You only have one worker that lives within the driver JVM process.
>>>>>>>> The Driver node runs on the same host that the cluster manager is
>>>>>>>> running. The Driver requests the Cluster Manager for resources to run
>>>>>>>> tasks. In this case there is only one executor for the Driver? The Executor
>>>>>>>> runs tasks for the Driver.
>>>>>>>>
>>>>>>>>
>>>>>>>> HTH
>>>>>>>>
>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 19 May 2016 at 20:37, Mathieu Longtin <ma...@closetwork.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> No master and no node manager, just the processes that do actual
>>>>>>>>> work.
>>>>>>>>>
>>>>>>>>> We use the "stand alone" version because we have a shared file
>>>>>>>>> system and a way of allocating computing resources already (Univa Grid
>>>>>>>>> Engine). If an executor were to die, we have other ways of restarting it,
>>>>>>>>> we don't need the worker manager to deal with it.
>>>>>>>>>
>>>>>>>>> On Thu, May 19, 2016 at 3:16 PM Mich Talebzadeh <
>>>>>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Mathieu
>>>>>>>>>>
>>>>>>>>>> What does this approach provide that the norm lacks?
>>>>>>>>>>
>>>>>>>>>> So basically each node has its master in this model.
>>>>>>>>>>
>>>>>>>>>> Are these supposed to be individual stand alone servers?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 19 May 2016 at 18:45, Mathieu Longtin <ma...@closetwork.org>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> First a bit of context:
>>>>>>>>>>> We use Spark on a platform where each user start workers as
>>>>>>>>>>> needed. This has the advantage that all permission management is handled by
>>>>>>>>>>> the OS, so the users can only read files they have permission to.
>>>>>>>>>>>
>>>>>>>>>>> To do this, we have some utility that does the following:
>>>>>>>>>>> - start a master
>>>>>>>>>>> - start worker managers on a number of servers
>>>>>>>>>>> - "submit" the Spark driver program
>>>>>>>>>>> - the driver then talks to the master, tell it how many
>>>>>>>>>>> executors it needs
>>>>>>>>>>> - the master tell the worker nodes to start executors and talk
>>>>>>>>>>> to the driver
>>>>>>>>>>> - the executors are started
>>>>>>>>>>>
>>>>>>>>>>> From here on, the master doesn't do much, neither do the process
>>>>>>>>>>> manager on the worker nodes.
>>>>>>>>>>>
>>>>>>>>>>> What I would like to do is simplify this to:
>>>>>>>>>>> - Start the driver program
>>>>>>>>>>> - Start executors on a number of servers, telling them where to
>>>>>>>>>>> find the driver
>>>>>>>>>>> - The executors connect directly to the driver
>>>>>>>>>>>
>>>>>>>>>>> Is there a way I could do this without the master and worker
>>>>>>>>>>> managers?
>>>>>>>>>>>
>>>>>>>>>>> Thanks!
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Mathieu Longtin
>>>>>>>>>>> 1-514-803-8977
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>> Mathieu Longtin
>>>>>>>>> 1-514-803-8977
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>> Mathieu Longtin
>>>>>>> 1-514-803-8977
>>>>>>>
>>>>>>
>>>>>> --
>>>>> Mathieu Longtin
>>>>> 1-514-803-8977
>>>>>
>>>>
>>>> --
>>> Mathieu Longtin
>>> 1-514-803-8977
>>>
>>
>> --
> Mathieu Longtin
> 1-514-803-8977
>

Re: Starting executor without a master

Posted by Mathieu Longtin <ma...@closetwork.org>.

Okay:
*host=my.local.server*
*port=someport*

This is the spark-submit command, which runs on my local server:
*$SPARK_HOME/bin/spark-submit --master spark://$host:$port
--executor-memory 4g python-script.py with args*

If I want 200 worker cores, I tell the cluster scheduler to run this
command on 200 cores:
*$SPARK_HOME/sbin/start-slave.sh --cores=1 --memory=4g spark://$host:$port *

That's it. When the task starts, it uses all available workers. If for some
reason, not enough cores are available immediately, it still starts
processing with whatever it gets and the load will be spread further as
workers come online.


On Thu, May 19, 2016 at 8:24 PM Mich Talebzadeh <mi...@gmail.com>
wrote:

> In a normal operation we tell spark which node the worker processes can
> run by adding the nodenames to conf/slaves.
>
> Not very clear on this in your case all the jobs run locally with say 100
> executor cores like below:
>
>
> ${SPARK_HOME}/bin/spark-submit \
>
>                 --master local[*] \
>
>                 --driver-memory xg \  --default would be 512M
>
>                 --num-executors=1 \   -- This is the constraint in
> stand-alone Spark cluster, whether specified or not
>
>                 --executor-memory=xG \ --
>
>                 --executor-cores=n \
>
> --master local[*] means all cores and --executor-cores in your case need
> not be specified? or you can cap it like above --executor-cores=n. If it
> is not specified then the Spark app will go and grab every core. Although
> in practice that does not happen it is just an upper ceiling. It is FIFO.
>
> What typical executor memory is specified in your case?
>
> Do you have a  sample snapshot of spark-submit job by any chance Mathieu?
>
> Cheers
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 20 May 2016 at 00:27, Mathieu Longtin <ma...@closetwork.org> wrote:
>
>> Mostly, the resource management is not up to the Spark master.
>>
>> We routinely start 100 executor-cores for 5 minute job, and they just
>> quit when they are done. Then those processor cores can do something else
>> entirely, they are not reserved for Spark at all.
>>
>> On Thu, May 19, 2016 at 4:55 PM Mich Talebzadeh <
>> mich.talebzadeh@gmail.com> wrote:
>>
>>> Then in theory every user can fire multiple spark-submit jobs. do you
>>> cap it with settings in  $SPARK_HOME/conf/spark-defaults.conf , but I
>>> guess in reality every user submits one job only.
>>>
>>> This is an interesting model for two reasons:
>>>
>>>
>>>    - It uses parallel processing across all the nodes or most of the
>>>    nodes to minimise the processing time
>>>    - it requires less intervention
>>>
>>>
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>>
>>> On 19 May 2016 at 21:33, Mathieu Longtin <ma...@closetwork.org> wrote:
>>>
>>>> Driver memory is default. Executor memory depends on job, the caller
>>>> decides how much memory to use. We don't specify --num-executors as we want
>>>> all cores assigned to the local master, since they were started by the
>>>> current user. No local executor.  --master=spark://localhost:someport. 1
>>>> core per executor.
>>>>
>>>> On Thu, May 19, 2016 at 4:12 PM Mich Talebzadeh <
>>>> mich.talebzadeh@gmail.com> wrote:
>>>>
>>>>> Thanks Mathieu
>>>>>
>>>>> So it would be interesting to see what resources allocated in your
>>>>> case, especially the num-executors and executor-cores. I gather every node
>>>>> has enough memory and cores.
>>>>>
>>>>>
>>>>>
>>>>> ${SPARK_HOME}/bin/spark-submit \
>>>>>
>>>>>                 --master local[2] \
>>>>>
>>>>>                 --driver-memory 4g \
>>>>>
>>>>>                 --num-executors=1 \
>>>>>
>>>>>                 --executor-memory=4G \
>>>>>
>>>>>                 --executor-cores=2 \
>>>>>
>>>>> Dr Mich Talebzadeh
>>>>>
>>>>>
>>>>>
>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>
>>>>>
>>>>>
>>>>> http://talebzadehmich.wordpress.com
>>>>>
>>>>>
>>>>>
>>>>> On 19 May 2016 at 21:02, Mathieu Longtin <ma...@closetwork.org>
>>>>> wrote:
>>>>>
>>>>>> The driver (the process started by spark-submit) runs locally. The
>>>>>> executors run on any of thousands of servers. So far, I haven't tried more
>>>>>> than 500 executors.
>>>>>>
>>>>>> Right now, I run a master on the same server as the driver.
>>>>>>
>>>>>> On Thu, May 19, 2016 at 3:49 PM Mich Talebzadeh <
>>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>>
>>>>>>> ok so you are using some form of NFS mounted file system shared
>>>>>>> among the nodes and basically you start the processes through spark-submit.
>>>>>>>
>>>>>>> In Stand-alone mode, a simple cluster manager included with Spark. It
>>>>>>> does the management of resources so it is not clear to me what you are
>>>>>>> referring as worker manager here?
>>>>>>>
>>>>>>> This is my take from your model.
>>>>>>>  The application will go and grab all the cores in the cluster.
>>>>>>> You only have one worker that lives within the driver JVM process.
>>>>>>> The Driver node runs on the same host that the cluster manager is
>>>>>>> running. The Driver requests the Cluster Manager for resources to run
>>>>>>> tasks. In this case there is only one executor for the Driver? The Executor
>>>>>>> runs tasks for the Driver.
>>>>>>>
>>>>>>>
>>>>>>> HTH
>>>>>>>
>>>>>>> Dr Mich Talebzadeh
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 19 May 2016 at 20:37, Mathieu Longtin <ma...@closetwork.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> No master and no node manager, just the processes that do actual
>>>>>>>> work.
>>>>>>>>
>>>>>>>> We use the "stand alone" version because we have a shared file
>>>>>>>> system and a way of allocating computing resources already (Univa Grid
>>>>>>>> Engine). If an executor were to die, we have other ways of restarting it,
>>>>>>>> we don't need the worker manager to deal with it.
>>>>>>>>
>>>>>>>> On Thu, May 19, 2016 at 3:16 PM Mich Talebzadeh <
>>>>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Mathieu
>>>>>>>>>
>>>>>>>>> What does this approach provide that the norm lacks?
>>>>>>>>>
>>>>>>>>> So basically each node has its master in this model.
>>>>>>>>>
>>>>>>>>> Are these supposed to be individual stand alone servers?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 19 May 2016 at 18:45, Mathieu Longtin <ma...@closetwork.org>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> First a bit of context:
>>>>>>>>>> We use Spark on a platform where each user start workers as
>>>>>>>>>> needed. This has the advantage that all permission management is handled by
>>>>>>>>>> the OS, so the users can only read files they have permission to.
>>>>>>>>>>
>>>>>>>>>> To do this, we have some utility that does the following:
>>>>>>>>>> - start a master
>>>>>>>>>> - start worker managers on a number of servers
>>>>>>>>>> - "submit" the Spark driver program
>>>>>>>>>> - the driver then talks to the master, tell it how many executors
>>>>>>>>>> it needs
>>>>>>>>>> - the master tell the worker nodes to start executors and talk to
>>>>>>>>>> the driver
>>>>>>>>>> - the executors are started
>>>>>>>>>>
>>>>>>>>>> From here on, the master doesn't do much, neither do the process
>>>>>>>>>> manager on the worker nodes.
>>>>>>>>>>
>>>>>>>>>> What I would like to do is simplify this to:
>>>>>>>>>> - Start the driver program
>>>>>>>>>> - Start executors on a number of servers, telling them where to
>>>>>>>>>> find the driver
>>>>>>>>>> - The executors connect directly to the driver
>>>>>>>>>>
>>>>>>>>>> Is there a way I could do this without the master and worker
>>>>>>>>>> managers?
>>>>>>>>>>
>>>>>>>>>> Thanks!
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Mathieu Longtin
>>>>>>>>>> 1-514-803-8977
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>> Mathieu Longtin
>>>>>>>> 1-514-803-8977
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>> Mathieu Longtin
>>>>>> 1-514-803-8977
>>>>>>
>>>>>
>>>>> --
>>>> Mathieu Longtin
>>>> 1-514-803-8977
>>>>
>>>
>>> --
>> Mathieu Longtin
>> 1-514-803-8977
>>
>
> --
Mathieu Longtin
1-514-803-8977

Re: Starting executor without a master

Posted by Mich Talebzadeh <mi...@gmail.com>.

In a normal operation we tell spark which node the worker processes can run
by adding the nodenames to conf/slaves.

Not very clear on this in your case all the jobs run locally with say 100
executor cores like below:


${SPARK_HOME}/bin/spark-submit \

                --master local[*] \

                --driver-memory xg \  --default would be 512M

                --num-executors=1 \   -- This is the constraint in
stand-alone Spark cluster, whether specified or not

                --executor-memory=xG \ --

                --executor-cores=n \

--master local[*] means all cores and --executor-cores in your case need
not be specified? or you can cap it like above --executor-cores=n. If it is
not specified then the Spark app will go and grab every core. Although in
practice that does not happen it is just an upper ceiling. It is FIFO.

What typical executor memory is specified in your case?

Do you have a  sample snapshot of spark-submit job by any chance Mathieu?

Cheers


Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 20 May 2016 at 00:27, Mathieu Longtin <ma...@closetwork.org> wrote:

> Mostly, the resource management is not up to the Spark master.
>
> We routinely start 100 executor-cores for 5 minute job, and they just quit
> when they are done. Then those processor cores can do something else
> entirely, they are not reserved for Spark at all.
>
> On Thu, May 19, 2016 at 4:55 PM Mich Talebzadeh <mi...@gmail.com>
> wrote:
>
>> Then in theory every user can fire multiple spark-submit jobs. do you cap
>> it with settings in  $SPARK_HOME/conf/spark-defaults.conf , but I guess
>> in reality every user submits one job only.
>>
>> This is an interesting model for two reasons:
>>
>>
>>    - It uses parallel processing across all the nodes or most of the
>>    nodes to minimise the processing time
>>    - it requires less intervention
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 19 May 2016 at 21:33, Mathieu Longtin <ma...@closetwork.org> wrote:
>>
>>> Driver memory is default. Executor memory depends on job, the caller
>>> decides how much memory to use. We don't specify --num-executors as we want
>>> all cores assigned to the local master, since they were started by the
>>> current user. No local executor.  --master=spark://localhost:someport. 1
>>> core per executor.
>>>
>>> On Thu, May 19, 2016 at 4:12 PM Mich Talebzadeh <
>>> mich.talebzadeh@gmail.com> wrote:
>>>
>>>> Thanks Mathieu
>>>>
>>>> So it would be interesting to see what resources allocated in your
>>>> case, especially the num-executors and executor-cores. I gather every node
>>>> has enough memory and cores.
>>>>
>>>>
>>>>
>>>> ${SPARK_HOME}/bin/spark-submit \
>>>>
>>>>                 --master local[2] \
>>>>
>>>>                 --driver-memory 4g \
>>>>
>>>>                 --num-executors=1 \
>>>>
>>>>                 --executor-memory=4G \
>>>>
>>>>                 --executor-cores=2 \
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>>
>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>>
>>>>
>>>> On 19 May 2016 at 21:02, Mathieu Longtin <ma...@closetwork.org>
>>>> wrote:
>>>>
>>>>> The driver (the process started by spark-submit) runs locally. The
>>>>> executors run on any of thousands of servers. So far, I haven't tried more
>>>>> than 500 executors.
>>>>>
>>>>> Right now, I run a master on the same server as the driver.
>>>>>
>>>>> On Thu, May 19, 2016 at 3:49 PM Mich Talebzadeh <
>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>
>>>>>> ok so you are using some form of NFS mounted file system shared among
>>>>>> the nodes and basically you start the processes through spark-submit.
>>>>>>
>>>>>> In Stand-alone mode, a simple cluster manager included with Spark. It
>>>>>> does the management of resources so it is not clear to me what you are
>>>>>> referring as worker manager here?
>>>>>>
>>>>>> This is my take from your model.
>>>>>>  The application will go and grab all the cores in the cluster.
>>>>>> You only have one worker that lives within the driver JVM process.
>>>>>> The Driver node runs on the same host that the cluster manager is
>>>>>> running. The Driver requests the Cluster Manager for resources to run
>>>>>> tasks. In this case there is only one executor for the Driver? The Executor
>>>>>> runs tasks for the Driver.
>>>>>>
>>>>>>
>>>>>> HTH
>>>>>>
>>>>>> Dr Mich Talebzadeh
>>>>>>
>>>>>>
>>>>>>
>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>
>>>>>>
>>>>>>
>>>>>> http://talebzadehmich.wordpress.com
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 19 May 2016 at 20:37, Mathieu Longtin <ma...@closetwork.org>
>>>>>> wrote:
>>>>>>
>>>>>>> No master and no node manager, just the processes that do actual
>>>>>>> work.
>>>>>>>
>>>>>>> We use the "stand alone" version because we have a shared file
>>>>>>> system and a way of allocating computing resources already (Univa Grid
>>>>>>> Engine). If an executor were to die, we have other ways of restarting it,
>>>>>>> we don't need the worker manager to deal with it.
>>>>>>>
>>>>>>> On Thu, May 19, 2016 at 3:16 PM Mich Talebzadeh <
>>>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi Mathieu
>>>>>>>>
>>>>>>>> What does this approach provide that the norm lacks?
>>>>>>>>
>>>>>>>> So basically each node has its master in this model.
>>>>>>>>
>>>>>>>> Are these supposed to be individual stand alone servers?
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>>
>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 19 May 2016 at 18:45, Mathieu Longtin <ma...@closetwork.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> First a bit of context:
>>>>>>>>> We use Spark on a platform where each user start workers as
>>>>>>>>> needed. This has the advantage that all permission management is handled by
>>>>>>>>> the OS, so the users can only read files they have permission to.
>>>>>>>>>
>>>>>>>>> To do this, we have some utility that does the following:
>>>>>>>>> - start a master
>>>>>>>>> - start worker managers on a number of servers
>>>>>>>>> - "submit" the Spark driver program
>>>>>>>>> - the driver then talks to the master, tell it how many executors
>>>>>>>>> it needs
>>>>>>>>> - the master tell the worker nodes to start executors and talk to
>>>>>>>>> the driver
>>>>>>>>> - the executors are started
>>>>>>>>>
>>>>>>>>> From here on, the master doesn't do much, neither do the process
>>>>>>>>> manager on the worker nodes.
>>>>>>>>>
>>>>>>>>> What I would like to do is simplify this to:
>>>>>>>>> - Start the driver program
>>>>>>>>> - Start executors on a number of servers, telling them where to
>>>>>>>>> find the driver
>>>>>>>>> - The executors connect directly to the driver
>>>>>>>>>
>>>>>>>>> Is there a way I could do this without the master and worker
>>>>>>>>> managers?
>>>>>>>>>
>>>>>>>>> Thanks!
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Mathieu Longtin
>>>>>>>>> 1-514-803-8977
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>> Mathieu Longtin
>>>>>>> 1-514-803-8977
>>>>>>>
>>>>>>
>>>>>> --
>>>>> Mathieu Longtin
>>>>> 1-514-803-8977
>>>>>
>>>>
>>>> --
>>> Mathieu Longtin
>>> 1-514-803-8977
>>>
>>
>> --
> Mathieu Longtin
> 1-514-803-8977
>

Re: Starting executor without a master

Posted by Mathieu Longtin <ma...@closetwork.org>.

Mostly, the resource management is not up to the Spark master.

We routinely start 100 executor-cores for 5 minute job, and they just quit
when they are done. Then those processor cores can do something else
entirely, they are not reserved for Spark at all.

On Thu, May 19, 2016 at 4:55 PM Mich Talebzadeh <mi...@gmail.com>
wrote:

> Then in theory every user can fire multiple spark-submit jobs. do you cap
> it with settings in  $SPARK_HOME/conf/spark-defaults.conf , but I guess
> in reality every user submits one job only.
>
> This is an interesting model for two reasons:
>
>
>    - It uses parallel processing across all the nodes or most of the
>    nodes to minimise the processing time
>    - it requires less intervention
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 19 May 2016 at 21:33, Mathieu Longtin <ma...@closetwork.org> wrote:
>
>> Driver memory is default. Executor memory depends on job, the caller
>> decides how much memory to use. We don't specify --num-executors as we want
>> all cores assigned to the local master, since they were started by the
>> current user. No local executor.  --master=spark://localhost:someport. 1
>> core per executor.
>>
>> On Thu, May 19, 2016 at 4:12 PM Mich Talebzadeh <
>> mich.talebzadeh@gmail.com> wrote:
>>
>>> Thanks Mathieu
>>>
>>> So it would be interesting to see what resources allocated in your case,
>>> especially the num-executors and executor-cores. I gather every node has
>>> enough memory and cores.
>>>
>>>
>>>
>>> ${SPARK_HOME}/bin/spark-submit \
>>>
>>>                 --master local[2] \
>>>
>>>                 --driver-memory 4g \
>>>
>>>                 --num-executors=1 \
>>>
>>>                 --executor-memory=4G \
>>>
>>>                 --executor-cores=2 \
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>>
>>> On 19 May 2016 at 21:02, Mathieu Longtin <ma...@closetwork.org> wrote:
>>>
>>>> The driver (the process started by spark-submit) runs locally. The
>>>> executors run on any of thousands of servers. So far, I haven't tried more
>>>> than 500 executors.
>>>>
>>>> Right now, I run a master on the same server as the driver.
>>>>
>>>> On Thu, May 19, 2016 at 3:49 PM Mich Talebzadeh <
>>>> mich.talebzadeh@gmail.com> wrote:
>>>>
>>>>> ok so you are using some form of NFS mounted file system shared among
>>>>> the nodes and basically you start the processes through spark-submit.
>>>>>
>>>>> In Stand-alone mode, a simple cluster manager included with Spark. It
>>>>> does the management of resources so it is not clear to me what you are
>>>>> referring as worker manager here?
>>>>>
>>>>> This is my take from your model.
>>>>>  The application will go and grab all the cores in the cluster.
>>>>> You only have one worker that lives within the driver JVM process.
>>>>> The Driver node runs on the same host that the cluster manager is
>>>>> running. The Driver requests the Cluster Manager for resources to run
>>>>> tasks. In this case there is only one executor for the Driver? The Executor
>>>>> runs tasks for the Driver.
>>>>>
>>>>>
>>>>> HTH
>>>>>
>>>>> Dr Mich Talebzadeh
>>>>>
>>>>>
>>>>>
>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>
>>>>>
>>>>>
>>>>> http://talebzadehmich.wordpress.com
>>>>>
>>>>>
>>>>>
>>>>> On 19 May 2016 at 20:37, Mathieu Longtin <ma...@closetwork.org>
>>>>> wrote:
>>>>>
>>>>>> No master and no node manager, just the processes that do actual work.
>>>>>>
>>>>>> We use the "stand alone" version because we have a shared file system
>>>>>> and a way of allocating computing resources already (Univa Grid Engine). If
>>>>>> an executor were to die, we have other ways of restarting it, we don't need
>>>>>> the worker manager to deal with it.
>>>>>>
>>>>>> On Thu, May 19, 2016 at 3:16 PM Mich Talebzadeh <
>>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>>
>>>>>>> Hi Mathieu
>>>>>>>
>>>>>>> What does this approach provide that the norm lacks?
>>>>>>>
>>>>>>> So basically each node has its master in this model.
>>>>>>>
>>>>>>> Are these supposed to be individual stand alone servers?
>>>>>>>
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>>
>>>>>>> Dr Mich Talebzadeh
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 19 May 2016 at 18:45, Mathieu Longtin <ma...@closetwork.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> First a bit of context:
>>>>>>>> We use Spark on a platform where each user start workers as needed.
>>>>>>>> This has the advantage that all permission management is handled by the OS,
>>>>>>>> so the users can only read files they have permission to.
>>>>>>>>
>>>>>>>> To do this, we have some utility that does the following:
>>>>>>>> - start a master
>>>>>>>> - start worker managers on a number of servers
>>>>>>>> - "submit" the Spark driver program
>>>>>>>> - the driver then talks to the master, tell it how many executors
>>>>>>>> it needs
>>>>>>>> - the master tell the worker nodes to start executors and talk to
>>>>>>>> the driver
>>>>>>>> - the executors are started
>>>>>>>>
>>>>>>>> From here on, the master doesn't do much, neither do the process
>>>>>>>> manager on the worker nodes.
>>>>>>>>
>>>>>>>> What I would like to do is simplify this to:
>>>>>>>> - Start the driver program
>>>>>>>> - Start executors on a number of servers, telling them where to
>>>>>>>> find the driver
>>>>>>>> - The executors connect directly to the driver
>>>>>>>>
>>>>>>>> Is there a way I could do this without the master and worker
>>>>>>>> managers?
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Mathieu Longtin
>>>>>>>> 1-514-803-8977
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>> Mathieu Longtin
>>>>>> 1-514-803-8977
>>>>>>
>>>>>
>>>>> --
>>>> Mathieu Longtin
>>>> 1-514-803-8977
>>>>
>>>
>>> --
>> Mathieu Longtin
>> 1-514-803-8977
>>
>
> --
Mathieu Longtin
1-514-803-8977

Re: Starting executor without a master

Posted by Mich Talebzadeh <mi...@gmail.com>.

Then in theory every user can fire multiple spark-submit jobs. do you cap
it with settings in  $SPARK_HOME/conf/spark-defaults.conf , but I guess in
reality every user submits one job only.

This is an interesting model for two reasons:


   - It uses parallel processing across all the nodes or most of the nodes
   to minimise the processing time
   - it requires less intervention



Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 19 May 2016 at 21:33, Mathieu Longtin <ma...@closetwork.org> wrote:

> Driver memory is default. Executor memory depends on job, the caller
> decides how much memory to use. We don't specify --num-executors as we want
> all cores assigned to the local master, since they were started by the
> current user. No local executor.  --master=spark://localhost:someport. 1
> core per executor.
>
> On Thu, May 19, 2016 at 4:12 PM Mich Talebzadeh <mi...@gmail.com>
> wrote:
>
>> Thanks Mathieu
>>
>> So it would be interesting to see what resources allocated in your case,
>> especially the num-executors and executor-cores. I gather every node has
>> enough memory and cores.
>>
>>
>>
>> ${SPARK_HOME}/bin/spark-submit \
>>
>>                 --master local[2] \
>>
>>                 --driver-memory 4g \
>>
>>                 --num-executors=1 \
>>
>>                 --executor-memory=4G \
>>
>>                 --executor-cores=2 \
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 19 May 2016 at 21:02, Mathieu Longtin <ma...@closetwork.org> wrote:
>>
>>> The driver (the process started by spark-submit) runs locally. The
>>> executors run on any of thousands of servers. So far, I haven't tried more
>>> than 500 executors.
>>>
>>> Right now, I run a master on the same server as the driver.
>>>
>>> On Thu, May 19, 2016 at 3:49 PM Mich Talebzadeh <
>>> mich.talebzadeh@gmail.com> wrote:
>>>
>>>> ok so you are using some form of NFS mounted file system shared among
>>>> the nodes and basically you start the processes through spark-submit.
>>>>
>>>> In Stand-alone mode, a simple cluster manager included with Spark. It
>>>> does the management of resources so it is not clear to me what you are
>>>> referring as worker manager here?
>>>>
>>>> This is my take from your model.
>>>>  The application will go and grab all the cores in the cluster.
>>>> You only have one worker that lives within the driver JVM process.
>>>> The Driver node runs on the same host that the cluster manager is
>>>> running. The Driver requests the Cluster Manager for resources to run
>>>> tasks. In this case there is only one executor for the Driver? The Executor
>>>> runs tasks for the Driver.
>>>>
>>>>
>>>> HTH
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>>
>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>>
>>>>
>>>> On 19 May 2016 at 20:37, Mathieu Longtin <ma...@closetwork.org>
>>>> wrote:
>>>>
>>>>> No master and no node manager, just the processes that do actual work.
>>>>>
>>>>> We use the "stand alone" version because we have a shared file system
>>>>> and a way of allocating computing resources already (Univa Grid Engine). If
>>>>> an executor were to die, we have other ways of restarting it, we don't need
>>>>> the worker manager to deal with it.
>>>>>
>>>>> On Thu, May 19, 2016 at 3:16 PM Mich Talebzadeh <
>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>
>>>>>> Hi Mathieu
>>>>>>
>>>>>> What does this approach provide that the norm lacks?
>>>>>>
>>>>>> So basically each node has its master in this model.
>>>>>>
>>>>>> Are these supposed to be individual stand alone servers?
>>>>>>
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>
>>>>>> Dr Mich Talebzadeh
>>>>>>
>>>>>>
>>>>>>
>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>
>>>>>>
>>>>>>
>>>>>> http://talebzadehmich.wordpress.com
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 19 May 2016 at 18:45, Mathieu Longtin <ma...@closetwork.org>
>>>>>> wrote:
>>>>>>
>>>>>>> First a bit of context:
>>>>>>> We use Spark on a platform where each user start workers as needed.
>>>>>>> This has the advantage that all permission management is handled by the OS,
>>>>>>> so the users can only read files they have permission to.
>>>>>>>
>>>>>>> To do this, we have some utility that does the following:
>>>>>>> - start a master
>>>>>>> - start worker managers on a number of servers
>>>>>>> - "submit" the Spark driver program
>>>>>>> - the driver then talks to the master, tell it how many executors it
>>>>>>> needs
>>>>>>> - the master tell the worker nodes to start executors and talk to
>>>>>>> the driver
>>>>>>> - the executors are started
>>>>>>>
>>>>>>> From here on, the master doesn't do much, neither do the process
>>>>>>> manager on the worker nodes.
>>>>>>>
>>>>>>> What I would like to do is simplify this to:
>>>>>>> - Start the driver program
>>>>>>> - Start executors on a number of servers, telling them where to find
>>>>>>> the driver
>>>>>>> - The executors connect directly to the driver
>>>>>>>
>>>>>>> Is there a way I could do this without the master and worker
>>>>>>> managers?
>>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Mathieu Longtin
>>>>>>> 1-514-803-8977
>>>>>>>
>>>>>>
>>>>>> --
>>>>> Mathieu Longtin
>>>>> 1-514-803-8977
>>>>>
>>>>
>>>> --
>>> Mathieu Longtin
>>> 1-514-803-8977
>>>
>>
>> --
> Mathieu Longtin
> 1-514-803-8977
>

Re: Starting executor without a master

Posted by Mathieu Longtin <ma...@closetwork.org>.

Driver memory is default. Executor memory depends on job, the caller
decides how much memory to use. We don't specify --num-executors as we want
all cores assigned to the local master, since they were started by the
current user. No local executor.  --master=spark://localhost:someport. 1
core per executor.

On Thu, May 19, 2016 at 4:12 PM Mich Talebzadeh <mi...@gmail.com>
wrote:

> Thanks Mathieu
>
> So it would be interesting to see what resources allocated in your case,
> especially the num-executors and executor-cores. I gather every node has
> enough memory and cores.
>
>
>
> ${SPARK_HOME}/bin/spark-submit \
>
>                 --master local[2] \
>
>                 --driver-memory 4g \
>
>                 --num-executors=1 \
>
>                 --executor-memory=4G \
>
>                 --executor-cores=2 \
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 19 May 2016 at 21:02, Mathieu Longtin <ma...@closetwork.org> wrote:
>
>> The driver (the process started by spark-submit) runs locally. The
>> executors run on any of thousands of servers. So far, I haven't tried more
>> than 500 executors.
>>
>> Right now, I run a master on the same server as the driver.
>>
>> On Thu, May 19, 2016 at 3:49 PM Mich Talebzadeh <
>> mich.talebzadeh@gmail.com> wrote:
>>
>>> ok so you are using some form of NFS mounted file system shared among
>>> the nodes and basically you start the processes through spark-submit.
>>>
>>> In Stand-alone mode, a simple cluster manager included with Spark. It
>>> does the management of resources so it is not clear to me what you are
>>> referring as worker manager here?
>>>
>>> This is my take from your model.
>>>  The application will go and grab all the cores in the cluster.
>>> You only have one worker that lives within the driver JVM process.
>>> The Driver node runs on the same host that the cluster manager is
>>> running. The Driver requests the Cluster Manager for resources to run
>>> tasks. In this case there is only one executor for the Driver? The Executor
>>> runs tasks for the Driver.
>>>
>>>
>>> HTH
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>>
>>> On 19 May 2016 at 20:37, Mathieu Longtin <ma...@closetwork.org> wrote:
>>>
>>>> No master and no node manager, just the processes that do actual work.
>>>>
>>>> We use the "stand alone" version because we have a shared file system
>>>> and a way of allocating computing resources already (Univa Grid Engine). If
>>>> an executor were to die, we have other ways of restarting it, we don't need
>>>> the worker manager to deal with it.
>>>>
>>>> On Thu, May 19, 2016 at 3:16 PM Mich Talebzadeh <
>>>> mich.talebzadeh@gmail.com> wrote:
>>>>
>>>>> Hi Mathieu
>>>>>
>>>>> What does this approach provide that the norm lacks?
>>>>>
>>>>> So basically each node has its master in this model.
>>>>>
>>>>> Are these supposed to be individual stand alone servers?
>>>>>
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>> Dr Mich Talebzadeh
>>>>>
>>>>>
>>>>>
>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>
>>>>>
>>>>>
>>>>> http://talebzadehmich.wordpress.com
>>>>>
>>>>>
>>>>>
>>>>> On 19 May 2016 at 18:45, Mathieu Longtin <ma...@closetwork.org>
>>>>> wrote:
>>>>>
>>>>>> First a bit of context:
>>>>>> We use Spark on a platform where each user start workers as needed.
>>>>>> This has the advantage that all permission management is handled by the OS,
>>>>>> so the users can only read files they have permission to.
>>>>>>
>>>>>> To do this, we have some utility that does the following:
>>>>>> - start a master
>>>>>> - start worker managers on a number of servers
>>>>>> - "submit" the Spark driver program
>>>>>> - the driver then talks to the master, tell it how many executors it
>>>>>> needs
>>>>>> - the master tell the worker nodes to start executors and talk to the
>>>>>> driver
>>>>>> - the executors are started
>>>>>>
>>>>>> From here on, the master doesn't do much, neither do the process
>>>>>> manager on the worker nodes.
>>>>>>
>>>>>> What I would like to do is simplify this to:
>>>>>> - Start the driver program
>>>>>> - Start executors on a number of servers, telling them where to find
>>>>>> the driver
>>>>>> - The executors connect directly to the driver
>>>>>>
>>>>>> Is there a way I could do this without the master and worker managers?
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Mathieu Longtin
>>>>>> 1-514-803-8977
>>>>>>
>>>>>
>>>>> --
>>>> Mathieu Longtin
>>>> 1-514-803-8977
>>>>
>>>
>>> --
>> Mathieu Longtin
>> 1-514-803-8977
>>
>
> --
Mathieu Longtin
1-514-803-8977

Re: Starting executor without a master

Posted by Mich Talebzadeh <mi...@gmail.com>.

Thanks Mathieu

So it would be interesting to see what resources allocated in your case,
especially the num-executors and executor-cores. I gather every node has
enough memory and cores.



${SPARK_HOME}/bin/spark-submit \

                --master local[2] \

                --driver-memory 4g \

                --num-executors=1 \

                --executor-memory=4G \

                --executor-cores=2 \

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 19 May 2016 at 21:02, Mathieu Longtin <ma...@closetwork.org> wrote:

> The driver (the process started by spark-submit) runs locally. The
> executors run on any of thousands of servers. So far, I haven't tried more
> than 500 executors.
>
> Right now, I run a master on the same server as the driver.
>
> On Thu, May 19, 2016 at 3:49 PM Mich Talebzadeh <mi...@gmail.com>
> wrote:
>
>> ok so you are using some form of NFS mounted file system shared among the
>> nodes and basically you start the processes through spark-submit.
>>
>> In Stand-alone mode, a simple cluster manager included with Spark. It
>> does the management of resources so it is not clear to me what you are
>> referring as worker manager here?
>>
>> This is my take from your model.
>>  The application will go and grab all the cores in the cluster.
>> You only have one worker that lives within the driver JVM process.
>> The Driver node runs on the same host that the cluster manager is
>> running. The Driver requests the Cluster Manager for resources to run
>> tasks. In this case there is only one executor for the Driver? The Executor
>> runs tasks for the Driver.
>>
>>
>> HTH
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 19 May 2016 at 20:37, Mathieu Longtin <ma...@closetwork.org> wrote:
>>
>>> No master and no node manager, just the processes that do actual work.
>>>
>>> We use the "stand alone" version because we have a shared file system
>>> and a way of allocating computing resources already (Univa Grid Engine). If
>>> an executor were to die, we have other ways of restarting it, we don't need
>>> the worker manager to deal with it.
>>>
>>> On Thu, May 19, 2016 at 3:16 PM Mich Talebzadeh <
>>> mich.talebzadeh@gmail.com> wrote:
>>>
>>>> Hi Mathieu
>>>>
>>>> What does this approach provide that the norm lacks?
>>>>
>>>> So basically each node has its master in this model.
>>>>
>>>> Are these supposed to be individual stand alone servers?
>>>>
>>>>
>>>> Thanks
>>>>
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>>
>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>>
>>>>
>>>> On 19 May 2016 at 18:45, Mathieu Longtin <ma...@closetwork.org>
>>>> wrote:
>>>>
>>>>> First a bit of context:
>>>>> We use Spark on a platform where each user start workers as needed.
>>>>> This has the advantage that all permission management is handled by the OS,
>>>>> so the users can only read files they have permission to.
>>>>>
>>>>> To do this, we have some utility that does the following:
>>>>> - start a master
>>>>> - start worker managers on a number of servers
>>>>> - "submit" the Spark driver program
>>>>> - the driver then talks to the master, tell it how many executors it
>>>>> needs
>>>>> - the master tell the worker nodes to start executors and talk to the
>>>>> driver
>>>>> - the executors are started
>>>>>
>>>>> From here on, the master doesn't do much, neither do the process
>>>>> manager on the worker nodes.
>>>>>
>>>>> What I would like to do is simplify this to:
>>>>> - Start the driver program
>>>>> - Start executors on a number of servers, telling them where to find
>>>>> the driver
>>>>> - The executors connect directly to the driver
>>>>>
>>>>> Is there a way I could do this without the master and worker managers?
>>>>>
>>>>> Thanks!
>>>>>
>>>>>
>>>>> --
>>>>> Mathieu Longtin
>>>>> 1-514-803-8977
>>>>>
>>>>
>>>> --
>>> Mathieu Longtin
>>> 1-514-803-8977
>>>
>>
>> --
> Mathieu Longtin
> 1-514-803-8977
>

Re: Starting executor without a master

Posted by Mathieu Longtin <ma...@closetwork.org>.

The driver (the process started by spark-submit) runs locally. The
executors run on any of thousands of servers. So far, I haven't tried more
than 500 executors.

Right now, I run a master on the same server as the driver.

On Thu, May 19, 2016 at 3:49 PM Mich Talebzadeh <mi...@gmail.com>
wrote:

> ok so you are using some form of NFS mounted file system shared among the
> nodes and basically you start the processes through spark-submit.
>
> In Stand-alone mode, a simple cluster manager included with Spark. It
> does the management of resources so it is not clear to me what you are
> referring as worker manager here?
>
> This is my take from your model.
>  The application will go and grab all the cores in the cluster.
> You only have one worker that lives within the driver JVM process.
> The Driver node runs on the same host that the cluster manager is running.
> The Driver requests the Cluster Manager for resources to run tasks. In this
> case there is only one executor for the Driver? The Executor runs tasks for
> the Driver.
>
>
> HTH
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 19 May 2016 at 20:37, Mathieu Longtin <ma...@closetwork.org> wrote:
>
>> No master and no node manager, just the processes that do actual work.
>>
>> We use the "stand alone" version because we have a shared file system and
>> a way of allocating computing resources already (Univa Grid Engine). If an
>> executor were to die, we have other ways of restarting it, we don't need
>> the worker manager to deal with it.
>>
>> On Thu, May 19, 2016 at 3:16 PM Mich Talebzadeh <
>> mich.talebzadeh@gmail.com> wrote:
>>
>>> Hi Mathieu
>>>
>>> What does this approach provide that the norm lacks?
>>>
>>> So basically each node has its master in this model.
>>>
>>> Are these supposed to be individual stand alone servers?
>>>
>>>
>>> Thanks
>>>
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>>
>>> On 19 May 2016 at 18:45, Mathieu Longtin <ma...@closetwork.org> wrote:
>>>
>>>> First a bit of context:
>>>> We use Spark on a platform where each user start workers as needed.
>>>> This has the advantage that all permission management is handled by the OS,
>>>> so the users can only read files they have permission to.
>>>>
>>>> To do this, we have some utility that does the following:
>>>> - start a master
>>>> - start worker managers on a number of servers
>>>> - "submit" the Spark driver program
>>>> - the driver then talks to the master, tell it how many executors it
>>>> needs
>>>> - the master tell the worker nodes to start executors and talk to the
>>>> driver
>>>> - the executors are started
>>>>
>>>> From here on, the master doesn't do much, neither do the process
>>>> manager on the worker nodes.
>>>>
>>>> What I would like to do is simplify this to:
>>>> - Start the driver program
>>>> - Start executors on a number of servers, telling them where to find
>>>> the driver
>>>> - The executors connect directly to the driver
>>>>
>>>> Is there a way I could do this without the master and worker managers?
>>>>
>>>> Thanks!
>>>>
>>>>
>>>> --
>>>> Mathieu Longtin
>>>> 1-514-803-8977
>>>>
>>>
>>> --
>> Mathieu Longtin
>> 1-514-803-8977
>>
>
> --
Mathieu Longtin
1-514-803-8977

Re: Starting executor without a master

Posted by Mich Talebzadeh <mi...@gmail.com>.

ok so you are using some form of NFS mounted file system shared among the
nodes and basically you start the processes through spark-submit.

In Stand-alone mode, a simple cluster manager included with Spark. It does
the management of resources so it is not clear to me what you are referring
as worker manager here?

This is my take from your model.
 The application will go and grab all the cores in the cluster.
You only have one worker that lives within the driver JVM process.
The Driver node runs on the same host that the cluster manager is running.
The Driver requests the Cluster Manager for resources to run tasks. In this
case there is only one executor for the Driver? The Executor runs tasks for
the Driver.


HTH

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 19 May 2016 at 20:37, Mathieu Longtin <ma...@closetwork.org> wrote:

> No master and no node manager, just the processes that do actual work.
>
> We use the "stand alone" version because we have a shared file system and
> a way of allocating computing resources already (Univa Grid Engine). If an
> executor were to die, we have other ways of restarting it, we don't need
> the worker manager to deal with it.
>
> On Thu, May 19, 2016 at 3:16 PM Mich Talebzadeh <mi...@gmail.com>
> wrote:
>
>> Hi Mathieu
>>
>> What does this approach provide that the norm lacks?
>>
>> So basically each node has its master in this model.
>>
>> Are these supposed to be individual stand alone servers?
>>
>>
>> Thanks
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 19 May 2016 at 18:45, Mathieu Longtin <ma...@closetwork.org> wrote:
>>
>>> First a bit of context:
>>> We use Spark on a platform where each user start workers as needed. This
>>> has the advantage that all permission management is handled by the OS, so
>>> the users can only read files they have permission to.
>>>
>>> To do this, we have some utility that does the following:
>>> - start a master
>>> - start worker managers on a number of servers
>>> - "submit" the Spark driver program
>>> - the driver then talks to the master, tell it how many executors it
>>> needs
>>> - the master tell the worker nodes to start executors and talk to the
>>> driver
>>> - the executors are started
>>>
>>> From here on, the master doesn't do much, neither do the process manager
>>> on the worker nodes.
>>>
>>> What I would like to do is simplify this to:
>>> - Start the driver program
>>> - Start executors on a number of servers, telling them where to find the
>>> driver
>>> - The executors connect directly to the driver
>>>
>>> Is there a way I could do this without the master and worker managers?
>>>
>>> Thanks!
>>>
>>>
>>> --
>>> Mathieu Longtin
>>> 1-514-803-8977
>>>
>>
>> --
> Mathieu Longtin
> 1-514-803-8977
>

Re: Starting executor without a master

Posted by Mathieu Longtin <ma...@closetwork.org>.

No master and no node manager, just the processes that do actual work.

We use the "stand alone" version because we have a shared file system and a
way of allocating computing resources already (Univa Grid Engine). If an
executor were to die, we have other ways of restarting it, we don't need
the worker manager to deal with it.

On Thu, May 19, 2016 at 3:16 PM Mich Talebzadeh <mi...@gmail.com>
wrote:

> Hi Mathieu
>
> What does this approach provide that the norm lacks?
>
> So basically each node has its master in this model.
>
> Are these supposed to be individual stand alone servers?
>
>
> Thanks
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 19 May 2016 at 18:45, Mathieu Longtin <ma...@closetwork.org> wrote:
>
>> First a bit of context:
>> We use Spark on a platform where each user start workers as needed. This
>> has the advantage that all permission management is handled by the OS, so
>> the users can only read files they have permission to.
>>
>> To do this, we have some utility that does the following:
>> - start a master
>> - start worker managers on a number of servers
>> - "submit" the Spark driver program
>> - the driver then talks to the master, tell it how many executors it needs
>> - the master tell the worker nodes to start executors and talk to the
>> driver
>> - the executors are started
>>
>> From here on, the master doesn't do much, neither do the process manager
>> on the worker nodes.
>>
>> What I would like to do is simplify this to:
>> - Start the driver program
>> - Start executors on a number of servers, telling them where to find the
>> driver
>> - The executors connect directly to the driver
>>
>> Is there a way I could do this without the master and worker managers?
>>
>> Thanks!
>>
>>
>> --
>> Mathieu Longtin
>> 1-514-803-8977
>>
>
> --
Mathieu Longtin
1-514-803-8977

Re: Starting executor without a master

Posted by Mich Talebzadeh <mi...@gmail.com>.

Hi Mathieu

What does this approach provide that the norm lacks?

So basically each node has its master in this model.

Are these supposed to be individual stand alone servers?


Thanks


Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 19 May 2016 at 18:45, Mathieu Longtin <ma...@closetwork.org> wrote:

> First a bit of context:
> We use Spark on a platform where each user start workers as needed. This
> has the advantage that all permission management is handled by the OS, so
> the users can only read files they have permission to.
>
> To do this, we have some utility that does the following:
> - start a master
> - start worker managers on a number of servers
> - "submit" the Spark driver program
> - the driver then talks to the master, tell it how many executors it needs
> - the master tell the worker nodes to start executors and talk to the
> driver
> - the executors are started
>
> From here on, the master doesn't do much, neither do the process manager
> on the worker nodes.
>
> What I would like to do is simplify this to:
> - Start the driver program
> - Start executors on a number of servers, telling them where to find the
> driver
> - The executors connect directly to the driver
>
> Is there a way I could do this without the master and worker managers?
>
> Thanks!
>
>
> --
> Mathieu Longtin
> 1-514-803-8977
>