You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by sujeet jog <su...@gmail.com> on 2016/05/28 15:42:58 UTC

local Vs Standalonecluster production deployment

Hi,

I have a question w.r.t  production deployment mode of spark,

I have 3 applications which i would like to run independently on a single
machine, i need to run the drivers in the same machine.

The amount of resources i have is also limited, like 4- 5GB RAM , 3 - 4
cores.

For deployment in standalone mode : i believe i need

1 Driver JVM,  1 worker node ( 1 executor )
1 Driver JVM,  1 worker node ( 1 executor )
1 Driver JVM,  1 worker node ( 1 executor )

The issue here is i will require 6 JVM running in parallel, for which i do
not have sufficient CPU/MEM resources,


Hence i was looking more towards a local mode deployment mode, would like
to know if anybody is using local mode where Driver + Executor run in a
single JVM in production mode.

Are there any inherent issues upfront using local mode for production base
systems.?..

Re: local Vs Standalonecluster production deployment

Posted by Mich Talebzadeh <mi...@gmail.com>.

Actually I did some tests on this.

In a nutshell when one runs Spark in a local mode, that means one physical
machine that shares all the resources.

In this mode there is no concept of master or slave. No need to start
$SPARK_HOME/sbin/start-master.sh or $SPARK_HOME/sbin/start-slave,sh. These
are not relevant as they relate to Spark which runs in a yarn-cluster where
there is one master and many worker/slaves and Spark divides tasks by using
executors to run these tasks on worker nodes.

In general a spark-submit can be started by

${SPARK_HOME}/bin/spark-submit \
                --driver-memory=2G \
                --num-executors=1 \
                --executor-memory=2G \
                --master local \
                --executor-cores=2 \
                --class "${FILE_NAME}" \
                ${JAR_FILE}


Note that regardless of num-executors set, there will ONLY be one executor
for a given spark-submit.  Executors are processes in charge of running
individual tasks in a given Spark job. The executors run the tasks assigned
by the driver program. In this case, one can only have one executor.

Case in point, I am currently running two JVM simultaneously. One doing
Spark streaming and the other using JDBC connection to import data from an
Oracle DB table to a Hive ORC table.

But each only has one executor. If you can live with it that is fine. This
is the executor for Spark streaming

[image: Inline images 5]


And this one is for JDBC connection stuff


Note that [image: Inline images 6]

Note that the memory allocation breakdown is decided by the Spark itself.
For example I specified executor-memory=2GB but only 1.247GB is allocated
to storage memory. There is only one executor created by its
respective driver. In other words Executor is mapped to the driver.

with jps I see

jps
12416 NodeManager
25315 QuorumPeerMain
23782 Kafka
11881 NameNode
12014 DataNode
13166 SparkSubmit
15568 SparkSubmit
26128 RunJar
12697 JobHistoryServer
25913 RunJar
12350 ResourceManager
12191 SecondaryNameNode
22143 Jps

Two SparkSubmit job and no master/worker processes.

ps aux|grep 13166
hduser   13166  7.1  4.1 3963140 1023300 pts/1 Sl+  23:19   1:58
/usr/java/latest/bin/java -cp
/home/hduser/jars/jconn4.jar:/home/hduser/jars/ojdbc6.jar:/usr/lib/spark-1.6.1-bin-hadoop2.6/conf/:/usr/lib/spark-1.6.1-bin-hadoop2.6/lib/spark-assembly-1.6.1-hadoop2.6.0.jar:/usr/lib/spark-1.6.1-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar:/usr/lib/spark-1.6.1-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar:/usr/lib/spark-1.6.1-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar:/home/hduser/hadoop-2.6.0/:/home/hduser/hadoop-2.6.0/etc/hadoop/
-Xms2G -Xmx2G org.apache.spark.deploy.SparkSubmit *--master local --conf
spark.driver.memory=2G --conf
spark.executor.extraJavaOptions=-XX:+PrintGCDetails -XX:+PrintGCTimeStamps
--class CEP_streaming --num-executors 1 --executor-memory 2G
--executor-cores 2* --jars
/home/hduser/jars/spark-streaming-kafka-assembly_2.10-1.6.1.jar
target/scala-2.10/CEP_streaming-assembly-1.0.jar



I think the execution model can be further simplified by either running
Spark in local mode meaning one physical host or in cluster mode meaning a
master node and multiple worker nodes with a resource manager. If I just
take Yarn as the resource manager then there will be one
Yarn  resourcemanager daemon started on  the resourcemanager node and one
nodemanager daemon started on all slaves/worker nodes.

Any ideas and comments are appreciated.

Thanks

HTH


Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 28 May 2016 at 20:18, sujeet jog <su...@gmail.com> wrote:

> Great, Thanks.
>
> On Sun, May 29, 2016 at 12:38 AM, Chris Fregly <ch...@fregly.com> wrote:
>
>> btw, here's a handy Spark Config Generator by Ewan Higgs in in Gent,
>> Belgium:
>>
>> code:  https://github.com/ehiggs/spark-config-gen
>>
>> demo:  http://ehiggs.github.io/spark-config-gen/
>>
>> my recent tweet on this:
>> https://twitter.com/cfregly/status/736631633927753729
>>
>> On Sat, May 28, 2016 at 10:50 AM, Mich Talebzadeh <
>> mich.talebzadeh@gmail.com> wrote:
>>
>>> hang on. Free is telling me you have 8GB of memory. I was under the
>>> impression that you had 4GB of RAM :)
>>>
>>> So with no app you have 3.99GB free ~ 4GB
>>>  1st app takes 428MB of memory and the second is 425MB so pretty lean
>>> apps
>>>
>>> The question is the apps that I run take 2-3GB each. But your mileage
>>> varies. If you end up with free memory running these minute apps and no
>>> sudden spike in memory/cpu usage then as long as they run and finish within
>>> SLA you should be OK whichever environment you run. May be you apps do not
>>> require that amount of memory.
>>>
>>> I don't think there is clear cut answer to NOT to use local mode in
>>> prod. Others may have different opinions on this.
>>>
>>> HTH
>>>
>>>
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>>
>>> On 28 May 2016 at 18:37, sujeet jog <su...@gmail.com> wrote:
>>>
>>>> ran these from muliple bash shell for now, probably a multi threaded
>>>> python script would do ,  memory and resource allocations are seen as
>>>> submitted parameters
>>>>
>>>>
>>>> *say before running any applications . *
>>>>
>>>> [root@fos-elastic02 ~]# /usr/bin/free
>>>>              total       used       free     shared    buffers
>>>> cached
>>>> Mem:       8058568    *4066296 *   3992272      10172     141368
>>>>  1549520
>>>> -/+ buffers/cache:    2375408    5683160
>>>> Swap:      8290300     108672    8181628
>>>>
>>>>
>>>> *only 1 App : *
>>>>
>>>> [root@fos-elastic02 ~]# /usr/bin/free
>>>>              total       used       free     shared    buffers
>>>> cached
>>>> Mem:       8058568    *4494488*    3564080      10172     141392
>>>>  1549948
>>>> -/+ buffers/cache:    2803148    5255420
>>>> Swap:      8290300     108672    8181628
>>>>
>>>>
>>>> ran the single APP twice in parallel ( memory used double as expected )
>>>>
>>>> [root@fos-elastic02 ~]# /usr/bin/free
>>>>              total       used       free     shared    buffers
>>>> cached
>>>> Mem:       8058568    *4919532 *   3139036      10172     141444
>>>>  1550376
>>>> -/+ buffers/cache:    3227712    4830856
>>>> Swap:      8290300     108672    8181628
>>>>
>>>>
>>>> Curious to know if local mode is used in real deployments where there
>>>> is a scarcity of resources.
>>>>
>>>>
>>>> Thanks,
>>>> Sujeet
>>>>
>>>> On Sat, May 28, 2016 at 10:50 PM, Mich Talebzadeh <
>>>> mich.talebzadeh@gmail.com> wrote:
>>>>
>>>>> OK that is good news. So briefly how do you kick off spark-submit for
>>>>> each (or sparkConf). In terms of memory/resources allocations.
>>>>>
>>>>> Now what is the output of
>>>>>
>>>>> /usr/bin/free
>>>>>
>>>>>
>>>>>
>>>>> Dr Mich Talebzadeh
>>>>>
>>>>>
>>>>>
>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>
>>>>>
>>>>>
>>>>> http://talebzadehmich.wordpress.com
>>>>>
>>>>>
>>>>>
>>>>> On 28 May 2016 at 18:12, sujeet jog <su...@gmail.com> wrote:
>>>>>
>>>>>> Yes Mich,
>>>>>> They are currently emitting the results parallely,
>>>>>> http://localhost:4040 &  http://localhost:4041 , i also see the
>>>>>> monitoring from these URL's,
>>>>>>
>>>>>>
>>>>>> On Sat, May 28, 2016 at 10:37 PM, Mich Talebzadeh <
>>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>>
>>>>>>> ok they are submitted but the latter one 14302 is it doing anything?
>>>>>>>
>>>>>>> can you check it with jmonitor or the logs created
>>>>>>>
>>>>>>> HTH
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Dr Mich Talebzadeh
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 28 May 2016 at 18:03, sujeet jog <su...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Thanks Ted,
>>>>>>>>
>>>>>>>> Thanks Mich,  yes i see that i can run two applications by
>>>>>>>> submitting these,  probably Driver + Executor running in a single JVM .
>>>>>>>> In-Process Spark.
>>>>>>>>
>>>>>>>> wondering if this can be used in production systems,  the reason
>>>>>>>> for me considering local instead of standalone cluster mode is purely
>>>>>>>> because of CPU/MEM resources,  i.e,  i currently do not have the liberty to
>>>>>>>> use 1 Driver & 1 Executor per application,    ( running in a embedded
>>>>>>>> network switch  )
>>>>>>>>
>>>>>>>>
>>>>>>>> jps output
>>>>>>>> [root@fos-elastic02 ~]# jps
>>>>>>>> 14258 SparkSubmit
>>>>>>>> 14503 Jps
>>>>>>>> 14302 SparkSubmit
>>>>>>>> ,
>>>>>>>>
>>>>>>>> On Sat, May 28, 2016 at 10:21 PM, Mich Talebzadeh <
>>>>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Ok so you want to run all this in local mode. In other words
>>>>>>>>> something like below
>>>>>>>>>
>>>>>>>>> ${SPARK_HOME}/bin/spark-submit \
>>>>>>>>>
>>>>>>>>>                 --master local[2] \
>>>>>>>>>
>>>>>>>>>                 --driver-memory 2G \
>>>>>>>>>
>>>>>>>>>                 --num-executors=1 \
>>>>>>>>>
>>>>>>>>>                 --executor-memory=2G \
>>>>>>>>>
>>>>>>>>>                 --executor-cores=2 \
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I am not sure it will work for multiple drivers (app/JVM).  The
>>>>>>>>> only way you can find out is to do try it running two apps simultaneously.
>>>>>>>>> You have a number of tools.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>    1. use jps to see the apps and PID
>>>>>>>>>    2. use jmonitor to see memory/cpu/heap usage for each
>>>>>>>>>    spark-submit job
>>>>>>>>>
>>>>>>>>> HTH
>>>>>>>>>
>>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 28 May 2016 at 17:41, Ted Yu <yu...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Sujeet:
>>>>>>>>>>
>>>>>>>>>> Please also see:
>>>>>>>>>>
>>>>>>>>>> https://spark.apache.org/docs/latest/spark-standalone.html
>>>>>>>>>>
>>>>>>>>>> On Sat, May 28, 2016 at 9:19 AM, Mich Talebzadeh <
>>>>>>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Sujeet,
>>>>>>>>>>>
>>>>>>>>>>> if you have a single machine then it is Spark standalone mode.
>>>>>>>>>>>
>>>>>>>>>>> In Standalone cluster mode Spark allocates resources based on
>>>>>>>>>>> cores. By default, an application will grab all the cores in the cluster.
>>>>>>>>>>>
>>>>>>>>>>> You only have one worker that lives within the driver JVM
>>>>>>>>>>> process that you start when you start the application with spark-shell or
>>>>>>>>>>> spark-submit in the host where the cluster manager is running.
>>>>>>>>>>>
>>>>>>>>>>> The Driver node runs on the same host that the cluster manager
>>>>>>>>>>> is running. The Driver requests the Cluster Manager for resources to run
>>>>>>>>>>> tasks. The worker is tasked to create the executor (in this case there is
>>>>>>>>>>> only one executor) for the Driver. The Executor runs tasks for the Driver.
>>>>>>>>>>> Only one executor can be allocated on each worker per application. In your
>>>>>>>>>>> case you only have
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> The minimum you will need will be 2-4G of RAM and two cores.
>>>>>>>>>>> Well that is my experience. Yes you can submit more than one spark-submit
>>>>>>>>>>> (the driver) but they may queue up behind the running one if there is not
>>>>>>>>>>> enough resources.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> You pointed out that you will be running few applications in
>>>>>>>>>>> parallel on the same host. The likelihood is that you are using a VM
>>>>>>>>>>> machine for this purpose and the best option is to try running the first
>>>>>>>>>>> one, Check Web GUI on  4040 to see the progress of this Job. If you start
>>>>>>>>>>> the next JVM then assuming it is working, it will be using port 4041 and so
>>>>>>>>>>> forth.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> In actual fact try the command "free" to see how much free
>>>>>>>>>>> memory you have.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> HTH
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 28 May 2016 at 16:42, sujeet jog <su...@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> I have a question w.r.t  production deployment mode of spark,
>>>>>>>>>>>>
>>>>>>>>>>>> I have 3 applications which i would like to run independently
>>>>>>>>>>>> on a single machine, i need to run the drivers in the same machine.
>>>>>>>>>>>>
>>>>>>>>>>>> The amount of resources i have is also limited, like 4- 5GB RAM
>>>>>>>>>>>> , 3 - 4 cores.
>>>>>>>>>>>>
>>>>>>>>>>>> For deployment in standalone mode : i believe i need
>>>>>>>>>>>>
>>>>>>>>>>>> 1 Driver JVM,  1 worker node ( 1 executor )
>>>>>>>>>>>> 1 Driver JVM,  1 worker node ( 1 executor )
>>>>>>>>>>>> 1 Driver JVM,  1 worker node ( 1 executor )
>>>>>>>>>>>>
>>>>>>>>>>>> The issue here is i will require 6 JVM running in parallel, for
>>>>>>>>>>>> which i do not have sufficient CPU/MEM resources,
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Hence i was looking more towards a local mode deployment mode,
>>>>>>>>>>>> would like to know if anybody is using local mode where Driver + Executor
>>>>>>>>>>>> run in a single JVM in production mode.
>>>>>>>>>>>>
>>>>>>>>>>>> Are there any inherent issues upfront using local mode for
>>>>>>>>>>>> production base systems.?..
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>>
>> --
>> *Chris Fregly*
>> Research Scientist @ Flux Capacitor AI
>> "Bringing AI Back to the Future!"
>> San Francisco, CA
>> http://fluxcapacitor.ai
>>
>
>

Re: local Vs Standalonecluster production deployment

Posted by sujeet jog <su...@gmail.com>.

Great, Thanks.

On Sun, May 29, 2016 at 12:38 AM, Chris Fregly <ch...@fregly.com> wrote:

> btw, here's a handy Spark Config Generator by Ewan Higgs in in Gent,
> Belgium:
>
> code:  https://github.com/ehiggs/spark-config-gen
>
> demo:  http://ehiggs.github.io/spark-config-gen/
>
> my recent tweet on this:
> https://twitter.com/cfregly/status/736631633927753729
>
> On Sat, May 28, 2016 at 10:50 AM, Mich Talebzadeh <
> mich.talebzadeh@gmail.com> wrote:
>
>> hang on. Free is telling me you have 8GB of memory. I was under the
>> impression that you had 4GB of RAM :)
>>
>> So with no app you have 3.99GB free ~ 4GB
>>  1st app takes 428MB of memory and the second is 425MB so pretty lean apps
>>
>> The question is the apps that I run take 2-3GB each. But your mileage
>> varies. If you end up with free memory running these minute apps and no
>> sudden spike in memory/cpu usage then as long as they run and finish within
>> SLA you should be OK whichever environment you run. May be you apps do not
>> require that amount of memory.
>>
>> I don't think there is clear cut answer to NOT to use local mode in prod.
>> Others may have different opinions on this.
>>
>> HTH
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 28 May 2016 at 18:37, sujeet jog <su...@gmail.com> wrote:
>>
>>> ran these from muliple bash shell for now, probably a multi threaded
>>> python script would do ,  memory and resource allocations are seen as
>>> submitted parameters
>>>
>>>
>>> *say before running any applications . *
>>>
>>> [root@fos-elastic02 ~]# /usr/bin/free
>>>              total       used       free     shared    buffers     cached
>>> Mem:       8058568    *4066296 *   3992272      10172     141368
>>>  1549520
>>> -/+ buffers/cache:    2375408    5683160
>>> Swap:      8290300     108672    8181628
>>>
>>>
>>> *only 1 App : *
>>>
>>> [root@fos-elastic02 ~]# /usr/bin/free
>>>              total       used       free     shared    buffers     cached
>>> Mem:       8058568    *4494488*    3564080      10172     141392
>>>  1549948
>>> -/+ buffers/cache:    2803148    5255420
>>> Swap:      8290300     108672    8181628
>>>
>>>
>>> ran the single APP twice in parallel ( memory used double as expected )
>>>
>>> [root@fos-elastic02 ~]# /usr/bin/free
>>>              total       used       free     shared    buffers     cached
>>> Mem:       8058568    *4919532 *   3139036      10172     141444
>>>  1550376
>>> -/+ buffers/cache:    3227712    4830856
>>> Swap:      8290300     108672    8181628
>>>
>>>
>>> Curious to know if local mode is used in real deployments where there is
>>> a scarcity of resources.
>>>
>>>
>>> Thanks,
>>> Sujeet
>>>
>>> On Sat, May 28, 2016 at 10:50 PM, Mich Talebzadeh <
>>> mich.talebzadeh@gmail.com> wrote:
>>>
>>>> OK that is good news. So briefly how do you kick off spark-submit for
>>>> each (or sparkConf). In terms of memory/resources allocations.
>>>>
>>>> Now what is the output of
>>>>
>>>> /usr/bin/free
>>>>
>>>>
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>>
>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>>
>>>>
>>>> On 28 May 2016 at 18:12, sujeet jog <su...@gmail.com> wrote:
>>>>
>>>>> Yes Mich,
>>>>> They are currently emitting the results parallely,
>>>>> http://localhost:4040 &  http://localhost:4041 , i also see the
>>>>> monitoring from these URL's,
>>>>>
>>>>>
>>>>> On Sat, May 28, 2016 at 10:37 PM, Mich Talebzadeh <
>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>
>>>>>> ok they are submitted but the latter one 14302 is it doing anything?
>>>>>>
>>>>>> can you check it with jmonitor or the logs created
>>>>>>
>>>>>> HTH
>>>>>>
>>>>>>
>>>>>>
>>>>>> Dr Mich Talebzadeh
>>>>>>
>>>>>>
>>>>>>
>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>
>>>>>>
>>>>>>
>>>>>> http://talebzadehmich.wordpress.com
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 28 May 2016 at 18:03, sujeet jog <su...@gmail.com> wrote:
>>>>>>
>>>>>>> Thanks Ted,
>>>>>>>
>>>>>>> Thanks Mich,  yes i see that i can run two applications by
>>>>>>> submitting these,  probably Driver + Executor running in a single JVM .
>>>>>>> In-Process Spark.
>>>>>>>
>>>>>>> wondering if this can be used in production systems,  the reason for
>>>>>>> me considering local instead of standalone cluster mode is purely because
>>>>>>> of CPU/MEM resources,  i.e,  i currently do not have the liberty to use 1
>>>>>>> Driver & 1 Executor per application,    ( running in a embedded network
>>>>>>> switch  )
>>>>>>>
>>>>>>>
>>>>>>> jps output
>>>>>>> [root@fos-elastic02 ~]# jps
>>>>>>> 14258 SparkSubmit
>>>>>>> 14503 Jps
>>>>>>> 14302 SparkSubmit
>>>>>>> ,
>>>>>>>
>>>>>>> On Sat, May 28, 2016 at 10:21 PM, Mich Talebzadeh <
>>>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>>>
>>>>>>>> Ok so you want to run all this in local mode. In other words
>>>>>>>> something like below
>>>>>>>>
>>>>>>>> ${SPARK_HOME}/bin/spark-submit \
>>>>>>>>
>>>>>>>>                 --master local[2] \
>>>>>>>>
>>>>>>>>                 --driver-memory 2G \
>>>>>>>>
>>>>>>>>                 --num-executors=1 \
>>>>>>>>
>>>>>>>>                 --executor-memory=2G \
>>>>>>>>
>>>>>>>>                 --executor-cores=2 \
>>>>>>>>
>>>>>>>>
>>>>>>>> I am not sure it will work for multiple drivers (app/JVM).  The
>>>>>>>> only way you can find out is to do try it running two apps simultaneously.
>>>>>>>> You have a number of tools.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>    1. use jps to see the apps and PID
>>>>>>>>    2. use jmonitor to see memory/cpu/heap usage for each
>>>>>>>>    spark-submit job
>>>>>>>>
>>>>>>>> HTH
>>>>>>>>
>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 28 May 2016 at 17:41, Ted Yu <yu...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Sujeet:
>>>>>>>>>
>>>>>>>>> Please also see:
>>>>>>>>>
>>>>>>>>> https://spark.apache.org/docs/latest/spark-standalone.html
>>>>>>>>>
>>>>>>>>> On Sat, May 28, 2016 at 9:19 AM, Mich Talebzadeh <
>>>>>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Sujeet,
>>>>>>>>>>
>>>>>>>>>> if you have a single machine then it is Spark standalone mode.
>>>>>>>>>>
>>>>>>>>>> In Standalone cluster mode Spark allocates resources based on
>>>>>>>>>> cores. By default, an application will grab all the cores in the cluster.
>>>>>>>>>>
>>>>>>>>>> You only have one worker that lives within the driver JVM process
>>>>>>>>>> that you start when you start the application with spark-shell or
>>>>>>>>>> spark-submit in the host where the cluster manager is running.
>>>>>>>>>>
>>>>>>>>>> The Driver node runs on the same host that the cluster manager is
>>>>>>>>>> running. The Driver requests the Cluster Manager for resources to run
>>>>>>>>>> tasks. The worker is tasked to create the executor (in this case there is
>>>>>>>>>> only one executor) for the Driver. The Executor runs tasks for the Driver.
>>>>>>>>>> Only one executor can be allocated on each worker per application. In your
>>>>>>>>>> case you only have
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> The minimum you will need will be 2-4G of RAM and two cores. Well
>>>>>>>>>> that is my experience. Yes you can submit more than one spark-submit (the
>>>>>>>>>> driver) but they may queue up behind the running one if there is not enough
>>>>>>>>>> resources.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> You pointed out that you will be running few applications in
>>>>>>>>>> parallel on the same host. The likelihood is that you are using a VM
>>>>>>>>>> machine for this purpose and the best option is to try running the first
>>>>>>>>>> one, Check Web GUI on  4040 to see the progress of this Job. If you start
>>>>>>>>>> the next JVM then assuming it is working, it will be using port 4041 and so
>>>>>>>>>> forth.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> In actual fact try the command "free" to see how much free memory
>>>>>>>>>> you have.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> HTH
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 28 May 2016 at 16:42, sujeet jog <su...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> I have a question w.r.t  production deployment mode of spark,
>>>>>>>>>>>
>>>>>>>>>>> I have 3 applications which i would like to run independently on
>>>>>>>>>>> a single machine, i need to run the drivers in the same machine.
>>>>>>>>>>>
>>>>>>>>>>> The amount of resources i have is also limited, like 4- 5GB RAM
>>>>>>>>>>> , 3 - 4 cores.
>>>>>>>>>>>
>>>>>>>>>>> For deployment in standalone mode : i believe i need
>>>>>>>>>>>
>>>>>>>>>>> 1 Driver JVM,  1 worker node ( 1 executor )
>>>>>>>>>>> 1 Driver JVM,  1 worker node ( 1 executor )
>>>>>>>>>>> 1 Driver JVM,  1 worker node ( 1 executor )
>>>>>>>>>>>
>>>>>>>>>>> The issue here is i will require 6 JVM running in parallel, for
>>>>>>>>>>> which i do not have sufficient CPU/MEM resources,
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Hence i was looking more towards a local mode deployment mode,
>>>>>>>>>>> would like to know if anybody is using local mode where Driver + Executor
>>>>>>>>>>> run in a single JVM in production mode.
>>>>>>>>>>>
>>>>>>>>>>> Are there any inherent issues upfront using local mode for
>>>>>>>>>>> production base systems.?..
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
>
> --
> *Chris Fregly*
> Research Scientist @ Flux Capacitor AI
> "Bringing AI Back to the Future!"
> San Francisco, CA
> http://fluxcapacitor.ai
>

Re: local Vs Standalonecluster production deployment

Posted by Chris Fregly <ch...@fregly.com>.

btw, here's a handy Spark Config Generator by Ewan Higgs in in Gent,
Belgium:

code:  https://github.com/ehiggs/spark-config-gen

demo:  http://ehiggs.github.io/spark-config-gen/

my recent tweet on this:
https://twitter.com/cfregly/status/736631633927753729

On Sat, May 28, 2016 at 10:50 AM, Mich Talebzadeh <mich.talebzadeh@gmail.com
> wrote:

> hang on. Free is telling me you have 8GB of memory. I was under the
> impression that you had 4GB of RAM :)
>
> So with no app you have 3.99GB free ~ 4GB
>  1st app takes 428MB of memory and the second is 425MB so pretty lean apps
>
> The question is the apps that I run take 2-3GB each. But your mileage
> varies. If you end up with free memory running these minute apps and no
> sudden spike in memory/cpu usage then as long as they run and finish within
> SLA you should be OK whichever environment you run. May be you apps do not
> require that amount of memory.
>
> I don't think there is clear cut answer to NOT to use local mode in prod.
> Others may have different opinions on this.
>
> HTH
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 28 May 2016 at 18:37, sujeet jog <su...@gmail.com> wrote:
>
>> ran these from muliple bash shell for now, probably a multi threaded
>> python script would do ,  memory and resource allocations are seen as
>> submitted parameters
>>
>>
>> *say before running any applications . *
>>
>> [root@fos-elastic02 ~]# /usr/bin/free
>>              total       used       free     shared    buffers     cached
>> Mem:       8058568    *4066296 *   3992272      10172     141368
>>  1549520
>> -/+ buffers/cache:    2375408    5683160
>> Swap:      8290300     108672    8181628
>>
>>
>> *only 1 App : *
>>
>> [root@fos-elastic02 ~]# /usr/bin/free
>>              total       used       free     shared    buffers     cached
>> Mem:       8058568    *4494488*    3564080      10172     141392
>>  1549948
>> -/+ buffers/cache:    2803148    5255420
>> Swap:      8290300     108672    8181628
>>
>>
>> ran the single APP twice in parallel ( memory used double as expected )
>>
>> [root@fos-elastic02 ~]# /usr/bin/free
>>              total       used       free     shared    buffers     cached
>> Mem:       8058568    *4919532 *   3139036      10172     141444
>>  1550376
>> -/+ buffers/cache:    3227712    4830856
>> Swap:      8290300     108672    8181628
>>
>>
>> Curious to know if local mode is used in real deployments where there is
>> a scarcity of resources.
>>
>>
>> Thanks,
>> Sujeet
>>
>> On Sat, May 28, 2016 at 10:50 PM, Mich Talebzadeh <
>> mich.talebzadeh@gmail.com> wrote:
>>
>>> OK that is good news. So briefly how do you kick off spark-submit for
>>> each (or sparkConf). In terms of memory/resources allocations.
>>>
>>> Now what is the output of
>>>
>>> /usr/bin/free
>>>
>>>
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>>
>>> On 28 May 2016 at 18:12, sujeet jog <su...@gmail.com> wrote:
>>>
>>>> Yes Mich,
>>>> They are currently emitting the results parallely,
>>>> http://localhost:4040 &  http://localhost:4041 , i also see the
>>>> monitoring from these URL's,
>>>>
>>>>
>>>> On Sat, May 28, 2016 at 10:37 PM, Mich Talebzadeh <
>>>> mich.talebzadeh@gmail.com> wrote:
>>>>
>>>>> ok they are submitted but the latter one 14302 is it doing anything?
>>>>>
>>>>> can you check it with jmonitor or the logs created
>>>>>
>>>>> HTH
>>>>>
>>>>>
>>>>>
>>>>> Dr Mich Talebzadeh
>>>>>
>>>>>
>>>>>
>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>
>>>>>
>>>>>
>>>>> http://talebzadehmich.wordpress.com
>>>>>
>>>>>
>>>>>
>>>>> On 28 May 2016 at 18:03, sujeet jog <su...@gmail.com> wrote:
>>>>>
>>>>>> Thanks Ted,
>>>>>>
>>>>>> Thanks Mich,  yes i see that i can run two applications by submitting
>>>>>> these,  probably Driver + Executor running in a single JVM .  In-Process
>>>>>> Spark.
>>>>>>
>>>>>> wondering if this can be used in production systems,  the reason for
>>>>>> me considering local instead of standalone cluster mode is purely because
>>>>>> of CPU/MEM resources,  i.e,  i currently do not have the liberty to use 1
>>>>>> Driver & 1 Executor per application,    ( running in a embedded network
>>>>>> switch  )
>>>>>>
>>>>>>
>>>>>> jps output
>>>>>> [root@fos-elastic02 ~]# jps
>>>>>> 14258 SparkSubmit
>>>>>> 14503 Jps
>>>>>> 14302 SparkSubmit
>>>>>> ,
>>>>>>
>>>>>> On Sat, May 28, 2016 at 10:21 PM, Mich Talebzadeh <
>>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>>
>>>>>>> Ok so you want to run all this in local mode. In other words
>>>>>>> something like below
>>>>>>>
>>>>>>> ${SPARK_HOME}/bin/spark-submit \
>>>>>>>
>>>>>>>                 --master local[2] \
>>>>>>>
>>>>>>>                 --driver-memory 2G \
>>>>>>>
>>>>>>>                 --num-executors=1 \
>>>>>>>
>>>>>>>                 --executor-memory=2G \
>>>>>>>
>>>>>>>                 --executor-cores=2 \
>>>>>>>
>>>>>>>
>>>>>>> I am not sure it will work for multiple drivers (app/JVM).  The only
>>>>>>> way you can find out is to do try it running two apps simultaneously. You
>>>>>>> have a number of tools.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>    1. use jps to see the apps and PID
>>>>>>>    2. use jmonitor to see memory/cpu/heap usage for each
>>>>>>>    spark-submit job
>>>>>>>
>>>>>>> HTH
>>>>>>>
>>>>>>> Dr Mich Talebzadeh
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 28 May 2016 at 17:41, Ted Yu <yu...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Sujeet:
>>>>>>>>
>>>>>>>> Please also see:
>>>>>>>>
>>>>>>>> https://spark.apache.org/docs/latest/spark-standalone.html
>>>>>>>>
>>>>>>>> On Sat, May 28, 2016 at 9:19 AM, Mich Talebzadeh <
>>>>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Sujeet,
>>>>>>>>>
>>>>>>>>> if you have a single machine then it is Spark standalone mode.
>>>>>>>>>
>>>>>>>>> In Standalone cluster mode Spark allocates resources based on
>>>>>>>>> cores. By default, an application will grab all the cores in the cluster.
>>>>>>>>>
>>>>>>>>> You only have one worker that lives within the driver JVM process
>>>>>>>>> that you start when you start the application with spark-shell or
>>>>>>>>> spark-submit in the host where the cluster manager is running.
>>>>>>>>>
>>>>>>>>> The Driver node runs on the same host that the cluster manager is
>>>>>>>>> running. The Driver requests the Cluster Manager for resources to run
>>>>>>>>> tasks. The worker is tasked to create the executor (in this case there is
>>>>>>>>> only one executor) for the Driver. The Executor runs tasks for the Driver.
>>>>>>>>> Only one executor can be allocated on each worker per application. In your
>>>>>>>>> case you only have
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The minimum you will need will be 2-4G of RAM and two cores. Well
>>>>>>>>> that is my experience. Yes you can submit more than one spark-submit (the
>>>>>>>>> driver) but they may queue up behind the running one if there is not enough
>>>>>>>>> resources.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> You pointed out that you will be running few applications in
>>>>>>>>> parallel on the same host. The likelihood is that you are using a VM
>>>>>>>>> machine for this purpose and the best option is to try running the first
>>>>>>>>> one, Check Web GUI on  4040 to see the progress of this Job. If you start
>>>>>>>>> the next JVM then assuming it is working, it will be using port 4041 and so
>>>>>>>>> forth.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> In actual fact try the command "free" to see how much free memory
>>>>>>>>> you have.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> HTH
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 28 May 2016 at 16:42, sujeet jog <su...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I have a question w.r.t  production deployment mode of spark,
>>>>>>>>>>
>>>>>>>>>> I have 3 applications which i would like to run independently on
>>>>>>>>>> a single machine, i need to run the drivers in the same machine.
>>>>>>>>>>
>>>>>>>>>> The amount of resources i have is also limited, like 4- 5GB RAM ,
>>>>>>>>>> 3 - 4 cores.
>>>>>>>>>>
>>>>>>>>>> For deployment in standalone mode : i believe i need
>>>>>>>>>>
>>>>>>>>>> 1 Driver JVM,  1 worker node ( 1 executor )
>>>>>>>>>> 1 Driver JVM,  1 worker node ( 1 executor )
>>>>>>>>>> 1 Driver JVM,  1 worker node ( 1 executor )
>>>>>>>>>>
>>>>>>>>>> The issue here is i will require 6 JVM running in parallel, for
>>>>>>>>>> which i do not have sufficient CPU/MEM resources,
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hence i was looking more towards a local mode deployment mode,
>>>>>>>>>> would like to know if anybody is using local mode where Driver + Executor
>>>>>>>>>> run in a single JVM in production mode.
>>>>>>>>>>
>>>>>>>>>> Are there any inherent issues upfront using local mode for
>>>>>>>>>> production base systems.?..
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>


-- 
*Chris Fregly*
Research Scientist @ Flux Capacitor AI
"Bringing AI Back to the Future!"
San Francisco, CA
http://fluxcapacitor.ai

Re: local Vs Standalonecluster production deployment

Posted by Mich Talebzadeh <mi...@gmail.com>.

hang on. Free is telling me you have 8GB of memory. I was under the
impression that you had 4GB of RAM :)

So with no app you have 3.99GB free ~ 4GB
 1st app takes 428MB of memory and the second is 425MB so pretty lean apps

The question is the apps that I run take 2-3GB each. But your mileage
varies. If you end up with free memory running these minute apps and no
sudden spike in memory/cpu usage then as long as they run and finish within
SLA you should be OK whichever environment you run. May be you apps do not
require that amount of memory.

I don't think there is clear cut answer to NOT to use local mode in prod.
Others may have different opinions on this.

HTH



Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 28 May 2016 at 18:37, sujeet jog <su...@gmail.com> wrote:

> ran these from muliple bash shell for now, probably a multi threaded
> python script would do ,  memory and resource allocations are seen as
> submitted parameters
>
>
> *say before running any applications . *
>
> [root@fos-elastic02 ~]# /usr/bin/free
>              total       used       free     shared    buffers     cached
> Mem:       8058568    *4066296 *   3992272      10172     141368
>  1549520
> -/+ buffers/cache:    2375408    5683160
> Swap:      8290300     108672    8181628
>
>
> *only 1 App : *
>
> [root@fos-elastic02 ~]# /usr/bin/free
>              total       used       free     shared    buffers     cached
> Mem:       8058568    *4494488*    3564080      10172     141392
>  1549948
> -/+ buffers/cache:    2803148    5255420
> Swap:      8290300     108672    8181628
>
>
> ran the single APP twice in parallel ( memory used double as expected )
>
> [root@fos-elastic02 ~]# /usr/bin/free
>              total       used       free     shared    buffers     cached
> Mem:       8058568    *4919532 *   3139036      10172     141444
>  1550376
> -/+ buffers/cache:    3227712    4830856
> Swap:      8290300     108672    8181628
>
>
> Curious to know if local mode is used in real deployments where there is a
> scarcity of resources.
>
>
> Thanks,
> Sujeet
>
> On Sat, May 28, 2016 at 10:50 PM, Mich Talebzadeh <
> mich.talebzadeh@gmail.com> wrote:
>
>> OK that is good news. So briefly how do you kick off spark-submit for
>> each (or sparkConf). In terms of memory/resources allocations.
>>
>> Now what is the output of
>>
>> /usr/bin/free
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 28 May 2016 at 18:12, sujeet jog <su...@gmail.com> wrote:
>>
>>> Yes Mich,
>>> They are currently emitting the results parallely,
>>> http://localhost:4040 &  http://localhost:4041 , i also see the
>>> monitoring from these URL's,
>>>
>>>
>>> On Sat, May 28, 2016 at 10:37 PM, Mich Talebzadeh <
>>> mich.talebzadeh@gmail.com> wrote:
>>>
>>>> ok they are submitted but the latter one 14302 is it doing anything?
>>>>
>>>> can you check it with jmonitor or the logs created
>>>>
>>>> HTH
>>>>
>>>>
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>>
>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>>
>>>>
>>>> On 28 May 2016 at 18:03, sujeet jog <su...@gmail.com> wrote:
>>>>
>>>>> Thanks Ted,
>>>>>
>>>>> Thanks Mich,  yes i see that i can run two applications by submitting
>>>>> these,  probably Driver + Executor running in a single JVM .  In-Process
>>>>> Spark.
>>>>>
>>>>> wondering if this can be used in production systems,  the reason for
>>>>> me considering local instead of standalone cluster mode is purely because
>>>>> of CPU/MEM resources,  i.e,  i currently do not have the liberty to use 1
>>>>> Driver & 1 Executor per application,    ( running in a embedded network
>>>>> switch  )
>>>>>
>>>>>
>>>>> jps output
>>>>> [root@fos-elastic02 ~]# jps
>>>>> 14258 SparkSubmit
>>>>> 14503 Jps
>>>>> 14302 SparkSubmit
>>>>> ,
>>>>>
>>>>> On Sat, May 28, 2016 at 10:21 PM, Mich Talebzadeh <
>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>
>>>>>> Ok so you want to run all this in local mode. In other words
>>>>>> something like below
>>>>>>
>>>>>> ${SPARK_HOME}/bin/spark-submit \
>>>>>>
>>>>>>                 --master local[2] \
>>>>>>
>>>>>>                 --driver-memory 2G \
>>>>>>
>>>>>>                 --num-executors=1 \
>>>>>>
>>>>>>                 --executor-memory=2G \
>>>>>>
>>>>>>                 --executor-cores=2 \
>>>>>>
>>>>>>
>>>>>> I am not sure it will work for multiple drivers (app/JVM).  The only
>>>>>> way you can find out is to do try it running two apps simultaneously. You
>>>>>> have a number of tools.
>>>>>>
>>>>>>
>>>>>>
>>>>>>    1. use jps to see the apps and PID
>>>>>>    2. use jmonitor to see memory/cpu/heap usage for each
>>>>>>    spark-submit job
>>>>>>
>>>>>> HTH
>>>>>>
>>>>>> Dr Mich Talebzadeh
>>>>>>
>>>>>>
>>>>>>
>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>
>>>>>>
>>>>>>
>>>>>> http://talebzadehmich.wordpress.com
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 28 May 2016 at 17:41, Ted Yu <yu...@gmail.com> wrote:
>>>>>>
>>>>>>> Sujeet:
>>>>>>>
>>>>>>> Please also see:
>>>>>>>
>>>>>>> https://spark.apache.org/docs/latest/spark-standalone.html
>>>>>>>
>>>>>>> On Sat, May 28, 2016 at 9:19 AM, Mich Talebzadeh <
>>>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi Sujeet,
>>>>>>>>
>>>>>>>> if you have a single machine then it is Spark standalone mode.
>>>>>>>>
>>>>>>>> In Standalone cluster mode Spark allocates resources based on
>>>>>>>> cores. By default, an application will grab all the cores in the cluster.
>>>>>>>>
>>>>>>>> You only have one worker that lives within the driver JVM process
>>>>>>>> that you start when you start the application with spark-shell or
>>>>>>>> spark-submit in the host where the cluster manager is running.
>>>>>>>>
>>>>>>>> The Driver node runs on the same host that the cluster manager is
>>>>>>>> running. The Driver requests the Cluster Manager for resources to run
>>>>>>>> tasks. The worker is tasked to create the executor (in this case there is
>>>>>>>> only one executor) for the Driver. The Executor runs tasks for the Driver.
>>>>>>>> Only one executor can be allocated on each worker per application. In your
>>>>>>>> case you only have
>>>>>>>>
>>>>>>>>
>>>>>>>> The minimum you will need will be 2-4G of RAM and two cores. Well
>>>>>>>> that is my experience. Yes you can submit more than one spark-submit (the
>>>>>>>> driver) but they may queue up behind the running one if there is not enough
>>>>>>>> resources.
>>>>>>>>
>>>>>>>>
>>>>>>>> You pointed out that you will be running few applications in
>>>>>>>> parallel on the same host. The likelihood is that you are using a VM
>>>>>>>> machine for this purpose and the best option is to try running the first
>>>>>>>> one, Check Web GUI on  4040 to see the progress of this Job. If you start
>>>>>>>> the next JVM then assuming it is working, it will be using port 4041 and so
>>>>>>>> forth.
>>>>>>>>
>>>>>>>>
>>>>>>>> In actual fact try the command "free" to see how much free memory
>>>>>>>> you have.
>>>>>>>>
>>>>>>>>
>>>>>>>> HTH
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 28 May 2016 at 16:42, sujeet jog <su...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I have a question w.r.t  production deployment mode of spark,
>>>>>>>>>
>>>>>>>>> I have 3 applications which i would like to run independently on a
>>>>>>>>> single machine, i need to run the drivers in the same machine.
>>>>>>>>>
>>>>>>>>> The amount of resources i have is also limited, like 4- 5GB RAM ,
>>>>>>>>> 3 - 4 cores.
>>>>>>>>>
>>>>>>>>> For deployment in standalone mode : i believe i need
>>>>>>>>>
>>>>>>>>> 1 Driver JVM,  1 worker node ( 1 executor )
>>>>>>>>> 1 Driver JVM,  1 worker node ( 1 executor )
>>>>>>>>> 1 Driver JVM,  1 worker node ( 1 executor )
>>>>>>>>>
>>>>>>>>> The issue here is i will require 6 JVM running in parallel, for
>>>>>>>>> which i do not have sufficient CPU/MEM resources,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hence i was looking more towards a local mode deployment mode,
>>>>>>>>> would like to know if anybody is using local mode where Driver + Executor
>>>>>>>>> run in a single JVM in production mode.
>>>>>>>>>
>>>>>>>>> Are there any inherent issues upfront using local mode for
>>>>>>>>> production base systems.?..
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: local Vs Standalonecluster production deployment

Posted by sujeet jog <su...@gmail.com>.

ran these from muliple bash shell for now, probably a multi threaded python
script would do ,  memory and resource allocations are seen as submitted
parameters


*say before running any applications . *

[root@fos-elastic02 ~]# /usr/bin/free
             total       used       free     shared    buffers     cached
Mem:       8058568    *4066296 *   3992272      10172     141368    1549520
-/+ buffers/cache:    2375408    5683160
Swap:      8290300     108672    8181628


*only 1 App : *

[root@fos-elastic02 ~]# /usr/bin/free
             total       used       free     shared    buffers     cached
Mem:       8058568    *4494488*    3564080      10172     141392    1549948
-/+ buffers/cache:    2803148    5255420
Swap:      8290300     108672    8181628


ran the single APP twice in parallel ( memory used double as expected )

[root@fos-elastic02 ~]# /usr/bin/free
             total       used       free     shared    buffers     cached
Mem:       8058568    *4919532 *   3139036      10172     141444    1550376
-/+ buffers/cache:    3227712    4830856
Swap:      8290300     108672    8181628


Curious to know if local mode is used in real deployments where there is a
scarcity of resources.


Thanks,
Sujeet

On Sat, May 28, 2016 at 10:50 PM, Mich Talebzadeh <mich.talebzadeh@gmail.com
> wrote:

> OK that is good news. So briefly how do you kick off spark-submit for each
> (or sparkConf). In terms of memory/resources allocations.
>
> Now what is the output of
>
> /usr/bin/free
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 28 May 2016 at 18:12, sujeet jog <su...@gmail.com> wrote:
>
>> Yes Mich,
>> They are currently emitting the results parallely,
>> http://localhost:4040 &  http://localhost:4041 , i also see the
>> monitoring from these URL's,
>>
>>
>> On Sat, May 28, 2016 at 10:37 PM, Mich Talebzadeh <
>> mich.talebzadeh@gmail.com> wrote:
>>
>>> ok they are submitted but the latter one 14302 is it doing anything?
>>>
>>> can you check it with jmonitor or the logs created
>>>
>>> HTH
>>>
>>>
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>>
>>> On 28 May 2016 at 18:03, sujeet jog <su...@gmail.com> wrote:
>>>
>>>> Thanks Ted,
>>>>
>>>> Thanks Mich,  yes i see that i can run two applications by submitting
>>>> these,  probably Driver + Executor running in a single JVM .  In-Process
>>>> Spark.
>>>>
>>>> wondering if this can be used in production systems,  the reason for me
>>>> considering local instead of standalone cluster mode is purely because of
>>>> CPU/MEM resources,  i.e,  i currently do not have the liberty to use 1
>>>> Driver & 1 Executor per application,    ( running in a embedded network
>>>> switch  )
>>>>
>>>>
>>>> jps output
>>>> [root@fos-elastic02 ~]# jps
>>>> 14258 SparkSubmit
>>>> 14503 Jps
>>>> 14302 SparkSubmit
>>>> ,
>>>>
>>>> On Sat, May 28, 2016 at 10:21 PM, Mich Talebzadeh <
>>>> mich.talebzadeh@gmail.com> wrote:
>>>>
>>>>> Ok so you want to run all this in local mode. In other words something
>>>>> like below
>>>>>
>>>>> ${SPARK_HOME}/bin/spark-submit \
>>>>>
>>>>>                 --master local[2] \
>>>>>
>>>>>                 --driver-memory 2G \
>>>>>
>>>>>                 --num-executors=1 \
>>>>>
>>>>>                 --executor-memory=2G \
>>>>>
>>>>>                 --executor-cores=2 \
>>>>>
>>>>>
>>>>> I am not sure it will work for multiple drivers (app/JVM).  The only
>>>>> way you can find out is to do try it running two apps simultaneously. You
>>>>> have a number of tools.
>>>>>
>>>>>
>>>>>
>>>>>    1. use jps to see the apps and PID
>>>>>    2. use jmonitor to see memory/cpu/heap usage for each spark-submit
>>>>>    job
>>>>>
>>>>> HTH
>>>>>
>>>>> Dr Mich Talebzadeh
>>>>>
>>>>>
>>>>>
>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>
>>>>>
>>>>>
>>>>> http://talebzadehmich.wordpress.com
>>>>>
>>>>>
>>>>>
>>>>> On 28 May 2016 at 17:41, Ted Yu <yu...@gmail.com> wrote:
>>>>>
>>>>>> Sujeet:
>>>>>>
>>>>>> Please also see:
>>>>>>
>>>>>> https://spark.apache.org/docs/latest/spark-standalone.html
>>>>>>
>>>>>> On Sat, May 28, 2016 at 9:19 AM, Mich Talebzadeh <
>>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>>
>>>>>>> Hi Sujeet,
>>>>>>>
>>>>>>> if you have a single machine then it is Spark standalone mode.
>>>>>>>
>>>>>>> In Standalone cluster mode Spark allocates resources based on
>>>>>>> cores. By default, an application will grab all the cores in the cluster.
>>>>>>>
>>>>>>> You only have one worker that lives within the driver JVM process
>>>>>>> that you start when you start the application with spark-shell or
>>>>>>> spark-submit in the host where the cluster manager is running.
>>>>>>>
>>>>>>> The Driver node runs on the same host that the cluster manager is
>>>>>>> running. The Driver requests the Cluster Manager for resources to run
>>>>>>> tasks. The worker is tasked to create the executor (in this case there is
>>>>>>> only one executor) for the Driver. The Executor runs tasks for the Driver.
>>>>>>> Only one executor can be allocated on each worker per application. In your
>>>>>>> case you only have
>>>>>>>
>>>>>>>
>>>>>>> The minimum you will need will be 2-4G of RAM and two cores. Well
>>>>>>> that is my experience. Yes you can submit more than one spark-submit (the
>>>>>>> driver) but they may queue up behind the running one if there is not enough
>>>>>>> resources.
>>>>>>>
>>>>>>>
>>>>>>> You pointed out that you will be running few applications in
>>>>>>> parallel on the same host. The likelihood is that you are using a VM
>>>>>>> machine for this purpose and the best option is to try running the first
>>>>>>> one, Check Web GUI on  4040 to see the progress of this Job. If you start
>>>>>>> the next JVM then assuming it is working, it will be using port 4041 and so
>>>>>>> forth.
>>>>>>>
>>>>>>>
>>>>>>> In actual fact try the command "free" to see how much free memory
>>>>>>> you have.
>>>>>>>
>>>>>>>
>>>>>>> HTH
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Dr Mich Talebzadeh
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 28 May 2016 at 16:42, sujeet jog <su...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I have a question w.r.t  production deployment mode of spark,
>>>>>>>>
>>>>>>>> I have 3 applications which i would like to run independently on a
>>>>>>>> single machine, i need to run the drivers in the same machine.
>>>>>>>>
>>>>>>>> The amount of resources i have is also limited, like 4- 5GB RAM , 3
>>>>>>>> - 4 cores.
>>>>>>>>
>>>>>>>> For deployment in standalone mode : i believe i need
>>>>>>>>
>>>>>>>> 1 Driver JVM,  1 worker node ( 1 executor )
>>>>>>>> 1 Driver JVM,  1 worker node ( 1 executor )
>>>>>>>> 1 Driver JVM,  1 worker node ( 1 executor )
>>>>>>>>
>>>>>>>> The issue here is i will require 6 JVM running in parallel, for
>>>>>>>> which i do not have sufficient CPU/MEM resources,
>>>>>>>>
>>>>>>>>
>>>>>>>> Hence i was looking more towards a local mode deployment mode,
>>>>>>>> would like to know if anybody is using local mode where Driver + Executor
>>>>>>>> run in a single JVM in production mode.
>>>>>>>>
>>>>>>>> Are there any inherent issues upfront using local mode for
>>>>>>>> production base systems.?..
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: local Vs Standalonecluster production deployment

Posted by Mich Talebzadeh <mi...@gmail.com>.

OK that is good news. So briefly how do you kick off spark-submit for each
(or sparkConf). In terms of memory/resources allocations.

Now what is the output of

/usr/bin/free



Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 28 May 2016 at 18:12, sujeet jog <su...@gmail.com> wrote:

> Yes Mich,
> They are currently emitting the results parallely,
> http://localhost:4040 &  http://localhost:4041 , i also see the
> monitoring from these URL's,
>
>
> On Sat, May 28, 2016 at 10:37 PM, Mich Talebzadeh <
> mich.talebzadeh@gmail.com> wrote:
>
>> ok they are submitted but the latter one 14302 is it doing anything?
>>
>> can you check it with jmonitor or the logs created
>>
>> HTH
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 28 May 2016 at 18:03, sujeet jog <su...@gmail.com> wrote:
>>
>>> Thanks Ted,
>>>
>>> Thanks Mich,  yes i see that i can run two applications by submitting
>>> these,  probably Driver + Executor running in a single JVM .  In-Process
>>> Spark.
>>>
>>> wondering if this can be used in production systems,  the reason for me
>>> considering local instead of standalone cluster mode is purely because of
>>> CPU/MEM resources,  i.e,  i currently do not have the liberty to use 1
>>> Driver & 1 Executor per application,    ( running in a embedded network
>>> switch  )
>>>
>>>
>>> jps output
>>> [root@fos-elastic02 ~]# jps
>>> 14258 SparkSubmit
>>> 14503 Jps
>>> 14302 SparkSubmit
>>> ,
>>>
>>> On Sat, May 28, 2016 at 10:21 PM, Mich Talebzadeh <
>>> mich.talebzadeh@gmail.com> wrote:
>>>
>>>> Ok so you want to run all this in local mode. In other words something
>>>> like below
>>>>
>>>> ${SPARK_HOME}/bin/spark-submit \
>>>>
>>>>                 --master local[2] \
>>>>
>>>>                 --driver-memory 2G \
>>>>
>>>>                 --num-executors=1 \
>>>>
>>>>                 --executor-memory=2G \
>>>>
>>>>                 --executor-cores=2 \
>>>>
>>>>
>>>> I am not sure it will work for multiple drivers (app/JVM).  The only
>>>> way you can find out is to do try it running two apps simultaneously. You
>>>> have a number of tools.
>>>>
>>>>
>>>>
>>>>    1. use jps to see the apps and PID
>>>>    2. use jmonitor to see memory/cpu/heap usage for each spark-submit
>>>>    job
>>>>
>>>> HTH
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>>
>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>>
>>>>
>>>> On 28 May 2016 at 17:41, Ted Yu <yu...@gmail.com> wrote:
>>>>
>>>>> Sujeet:
>>>>>
>>>>> Please also see:
>>>>>
>>>>> https://spark.apache.org/docs/latest/spark-standalone.html
>>>>>
>>>>> On Sat, May 28, 2016 at 9:19 AM, Mich Talebzadeh <
>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>
>>>>>> Hi Sujeet,
>>>>>>
>>>>>> if you have a single machine then it is Spark standalone mode.
>>>>>>
>>>>>> In Standalone cluster mode Spark allocates resources based on cores.
>>>>>> By default, an application will grab all the cores in the cluster.
>>>>>>
>>>>>> You only have one worker that lives within the driver JVM process
>>>>>> that you start when you start the application with spark-shell or
>>>>>> spark-submit in the host where the cluster manager is running.
>>>>>>
>>>>>> The Driver node runs on the same host that the cluster manager is
>>>>>> running. The Driver requests the Cluster Manager for resources to run
>>>>>> tasks. The worker is tasked to create the executor (in this case there is
>>>>>> only one executor) for the Driver. The Executor runs tasks for the Driver.
>>>>>> Only one executor can be allocated on each worker per application. In your
>>>>>> case you only have
>>>>>>
>>>>>>
>>>>>> The minimum you will need will be 2-4G of RAM and two cores. Well
>>>>>> that is my experience. Yes you can submit more than one spark-submit (the
>>>>>> driver) but they may queue up behind the running one if there is not enough
>>>>>> resources.
>>>>>>
>>>>>>
>>>>>> You pointed out that you will be running few applications in parallel
>>>>>> on the same host. The likelihood is that you are using a VM machine for
>>>>>> this purpose and the best option is to try running the first one, Check Web
>>>>>> GUI on  4040 to see the progress of this Job. If you start the next JVM
>>>>>> then assuming it is working, it will be using port 4041 and so forth.
>>>>>>
>>>>>>
>>>>>> In actual fact try the command "free" to see how much free memory you
>>>>>> have.
>>>>>>
>>>>>>
>>>>>> HTH
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Dr Mich Talebzadeh
>>>>>>
>>>>>>
>>>>>>
>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>
>>>>>>
>>>>>>
>>>>>> http://talebzadehmich.wordpress.com
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 28 May 2016 at 16:42, sujeet jog <su...@gmail.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I have a question w.r.t  production deployment mode of spark,
>>>>>>>
>>>>>>> I have 3 applications which i would like to run independently on a
>>>>>>> single machine, i need to run the drivers in the same machine.
>>>>>>>
>>>>>>> The amount of resources i have is also limited, like 4- 5GB RAM , 3
>>>>>>> - 4 cores.
>>>>>>>
>>>>>>> For deployment in standalone mode : i believe i need
>>>>>>>
>>>>>>> 1 Driver JVM,  1 worker node ( 1 executor )
>>>>>>> 1 Driver JVM,  1 worker node ( 1 executor )
>>>>>>> 1 Driver JVM,  1 worker node ( 1 executor )
>>>>>>>
>>>>>>> The issue here is i will require 6 JVM running in parallel, for
>>>>>>> which i do not have sufficient CPU/MEM resources,
>>>>>>>
>>>>>>>
>>>>>>> Hence i was looking more towards a local mode deployment mode, would
>>>>>>> like to know if anybody is using local mode where Driver + Executor run in
>>>>>>> a single JVM in production mode.
>>>>>>>
>>>>>>> Are there any inherent issues upfront using local mode for
>>>>>>> production base systems.?..
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: local Vs Standalonecluster production deployment

Posted by sujeet jog <su...@gmail.com>.

Yes Mich,
They are currently emitting the results parallely,    http://localhost:4040
&  http://localhost:4041 , i also see the monitoring from these URL's,


On Sat, May 28, 2016 at 10:37 PM, Mich Talebzadeh <mich.talebzadeh@gmail.com
> wrote:

> ok they are submitted but the latter one 14302 is it doing anything?
>
> can you check it with jmonitor or the logs created
>
> HTH
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 28 May 2016 at 18:03, sujeet jog <su...@gmail.com> wrote:
>
>> Thanks Ted,
>>
>> Thanks Mich,  yes i see that i can run two applications by submitting
>> these,  probably Driver + Executor running in a single JVM .  In-Process
>> Spark.
>>
>> wondering if this can be used in production systems,  the reason for me
>> considering local instead of standalone cluster mode is purely because of
>> CPU/MEM resources,  i.e,  i currently do not have the liberty to use 1
>> Driver & 1 Executor per application,    ( running in a embedded network
>> switch  )
>>
>>
>> jps output
>> [root@fos-elastic02 ~]# jps
>> 14258 SparkSubmit
>> 14503 Jps
>> 14302 SparkSubmit
>> ,
>>
>> On Sat, May 28, 2016 at 10:21 PM, Mich Talebzadeh <
>> mich.talebzadeh@gmail.com> wrote:
>>
>>> Ok so you want to run all this in local mode. In other words something
>>> like below
>>>
>>> ${SPARK_HOME}/bin/spark-submit \
>>>
>>>                 --master local[2] \
>>>
>>>                 --driver-memory 2G \
>>>
>>>                 --num-executors=1 \
>>>
>>>                 --executor-memory=2G \
>>>
>>>                 --executor-cores=2 \
>>>
>>>
>>> I am not sure it will work for multiple drivers (app/JVM).  The only way
>>> you can find out is to do try it running two apps simultaneously. You have
>>> a number of tools.
>>>
>>>
>>>
>>>    1. use jps to see the apps and PID
>>>    2. use jmonitor to see memory/cpu/heap usage for each spark-submit
>>>    job
>>>
>>> HTH
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>>
>>> On 28 May 2016 at 17:41, Ted Yu <yu...@gmail.com> wrote:
>>>
>>>> Sujeet:
>>>>
>>>> Please also see:
>>>>
>>>> https://spark.apache.org/docs/latest/spark-standalone.html
>>>>
>>>> On Sat, May 28, 2016 at 9:19 AM, Mich Talebzadeh <
>>>> mich.talebzadeh@gmail.com> wrote:
>>>>
>>>>> Hi Sujeet,
>>>>>
>>>>> if you have a single machine then it is Spark standalone mode.
>>>>>
>>>>> In Standalone cluster mode Spark allocates resources based on cores.
>>>>> By default, an application will grab all the cores in the cluster.
>>>>>
>>>>> You only have one worker that lives within the driver JVM process that
>>>>> you start when you start the application with spark-shell or spark-submit
>>>>> in the host where the cluster manager is running.
>>>>>
>>>>> The Driver node runs on the same host that the cluster manager is
>>>>> running. The Driver requests the Cluster Manager for resources to run
>>>>> tasks. The worker is tasked to create the executor (in this case there is
>>>>> only one executor) for the Driver. The Executor runs tasks for the Driver.
>>>>> Only one executor can be allocated on each worker per application. In your
>>>>> case you only have
>>>>>
>>>>>
>>>>> The minimum you will need will be 2-4G of RAM and two cores. Well that
>>>>> is my experience. Yes you can submit more than one spark-submit (the
>>>>> driver) but they may queue up behind the running one if there is not enough
>>>>> resources.
>>>>>
>>>>>
>>>>> You pointed out that you will be running few applications in parallel
>>>>> on the same host. The likelihood is that you are using a VM machine for
>>>>> this purpose and the best option is to try running the first one, Check Web
>>>>> GUI on  4040 to see the progress of this Job. If you start the next JVM
>>>>> then assuming it is working, it will be using port 4041 and so forth.
>>>>>
>>>>>
>>>>> In actual fact try the command "free" to see how much free memory you
>>>>> have.
>>>>>
>>>>>
>>>>> HTH
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Dr Mich Talebzadeh
>>>>>
>>>>>
>>>>>
>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>
>>>>>
>>>>>
>>>>> http://talebzadehmich.wordpress.com
>>>>>
>>>>>
>>>>>
>>>>> On 28 May 2016 at 16:42, sujeet jog <su...@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I have a question w.r.t  production deployment mode of spark,
>>>>>>
>>>>>> I have 3 applications which i would like to run independently on a
>>>>>> single machine, i need to run the drivers in the same machine.
>>>>>>
>>>>>> The amount of resources i have is also limited, like 4- 5GB RAM , 3 -
>>>>>> 4 cores.
>>>>>>
>>>>>> For deployment in standalone mode : i believe i need
>>>>>>
>>>>>> 1 Driver JVM,  1 worker node ( 1 executor )
>>>>>> 1 Driver JVM,  1 worker node ( 1 executor )
>>>>>> 1 Driver JVM,  1 worker node ( 1 executor )
>>>>>>
>>>>>> The issue here is i will require 6 JVM running in parallel, for which
>>>>>> i do not have sufficient CPU/MEM resources,
>>>>>>
>>>>>>
>>>>>> Hence i was looking more towards a local mode deployment mode, would
>>>>>> like to know if anybody is using local mode where Driver + Executor run in
>>>>>> a single JVM in production mode.
>>>>>>
>>>>>> Are there any inherent issues upfront using local mode for production
>>>>>> base systems.?..
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: local Vs Standalonecluster production deployment

Posted by Mich Talebzadeh <mi...@gmail.com>.

ok they are submitted but the latter one 14302 is it doing anything?

can you check it with jmonitor or the logs created

HTH



Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 28 May 2016 at 18:03, sujeet jog <su...@gmail.com> wrote:

> Thanks Ted,
>
> Thanks Mich,  yes i see that i can run two applications by submitting
> these,  probably Driver + Executor running in a single JVM .  In-Process
> Spark.
>
> wondering if this can be used in production systems,  the reason for me
> considering local instead of standalone cluster mode is purely because of
> CPU/MEM resources,  i.e,  i currently do not have the liberty to use 1
> Driver & 1 Executor per application,    ( running in a embedded network
> switch  )
>
>
> jps output
> [root@fos-elastic02 ~]# jps
> 14258 SparkSubmit
> 14503 Jps
> 14302 SparkSubmit
> ,
>
> On Sat, May 28, 2016 at 10:21 PM, Mich Talebzadeh <
> mich.talebzadeh@gmail.com> wrote:
>
>> Ok so you want to run all this in local mode. In other words something
>> like below
>>
>> ${SPARK_HOME}/bin/spark-submit \
>>
>>                 --master local[2] \
>>
>>                 --driver-memory 2G \
>>
>>                 --num-executors=1 \
>>
>>                 --executor-memory=2G \
>>
>>                 --executor-cores=2 \
>>
>>
>> I am not sure it will work for multiple drivers (app/JVM).  The only way
>> you can find out is to do try it running two apps simultaneously. You have
>> a number of tools.
>>
>>
>>
>>    1. use jps to see the apps and PID
>>    2. use jmonitor to see memory/cpu/heap usage for each spark-submit job
>>
>> HTH
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 28 May 2016 at 17:41, Ted Yu <yu...@gmail.com> wrote:
>>
>>> Sujeet:
>>>
>>> Please also see:
>>>
>>> https://spark.apache.org/docs/latest/spark-standalone.html
>>>
>>> On Sat, May 28, 2016 at 9:19 AM, Mich Talebzadeh <
>>> mich.talebzadeh@gmail.com> wrote:
>>>
>>>> Hi Sujeet,
>>>>
>>>> if you have a single machine then it is Spark standalone mode.
>>>>
>>>> In Standalone cluster mode Spark allocates resources based on cores.
>>>> By default, an application will grab all the cores in the cluster.
>>>>
>>>> You only have one worker that lives within the driver JVM process that
>>>> you start when you start the application with spark-shell or spark-submit
>>>> in the host where the cluster manager is running.
>>>>
>>>> The Driver node runs on the same host that the cluster manager is
>>>> running. The Driver requests the Cluster Manager for resources to run
>>>> tasks. The worker is tasked to create the executor (in this case there is
>>>> only one executor) for the Driver. The Executor runs tasks for the Driver.
>>>> Only one executor can be allocated on each worker per application. In your
>>>> case you only have
>>>>
>>>>
>>>> The minimum you will need will be 2-4G of RAM and two cores. Well that
>>>> is my experience. Yes you can submit more than one spark-submit (the
>>>> driver) but they may queue up behind the running one if there is not enough
>>>> resources.
>>>>
>>>>
>>>> You pointed out that you will be running few applications in parallel
>>>> on the same host. The likelihood is that you are using a VM machine for
>>>> this purpose and the best option is to try running the first one, Check Web
>>>> GUI on  4040 to see the progress of this Job. If you start the next JVM
>>>> then assuming it is working, it will be using port 4041 and so forth.
>>>>
>>>>
>>>> In actual fact try the command "free" to see how much free memory you
>>>> have.
>>>>
>>>>
>>>> HTH
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>>
>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>>
>>>>
>>>> On 28 May 2016 at 16:42, sujeet jog <su...@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I have a question w.r.t  production deployment mode of spark,
>>>>>
>>>>> I have 3 applications which i would like to run independently on a
>>>>> single machine, i need to run the drivers in the same machine.
>>>>>
>>>>> The amount of resources i have is also limited, like 4- 5GB RAM , 3 -
>>>>> 4 cores.
>>>>>
>>>>> For deployment in standalone mode : i believe i need
>>>>>
>>>>> 1 Driver JVM,  1 worker node ( 1 executor )
>>>>> 1 Driver JVM,  1 worker node ( 1 executor )
>>>>> 1 Driver JVM,  1 worker node ( 1 executor )
>>>>>
>>>>> The issue here is i will require 6 JVM running in parallel, for which
>>>>> i do not have sufficient CPU/MEM resources,
>>>>>
>>>>>
>>>>> Hence i was looking more towards a local mode deployment mode, would
>>>>> like to know if anybody is using local mode where Driver + Executor run in
>>>>> a single JVM in production mode.
>>>>>
>>>>> Are there any inherent issues upfront using local mode for production
>>>>> base systems.?..
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: local Vs Standalonecluster production deployment

Posted by sujeet jog <su...@gmail.com>.

Thanks Ted,

Thanks Mich,  yes i see that i can run two applications by submitting
these,  probably Driver + Executor running in a single JVM .  In-Process
Spark.

wondering if this can be used in production systems,  the reason for me
considering local instead of standalone cluster mode is purely because of
CPU/MEM resources,  i.e,  i currently do not have the liberty to use 1
Driver & 1 Executor per application,    ( running in a embedded network
switch  )


jps output
[root@fos-elastic02 ~]# jps
14258 SparkSubmit
14503 Jps
14302 SparkSubmit
,

On Sat, May 28, 2016 at 10:21 PM, Mich Talebzadeh <mich.talebzadeh@gmail.com
> wrote:

> Ok so you want to run all this in local mode. In other words something
> like below
>
> ${SPARK_HOME}/bin/spark-submit \
>
>                 --master local[2] \
>
>                 --driver-memory 2G \
>
>                 --num-executors=1 \
>
>                 --executor-memory=2G \
>
>                 --executor-cores=2 \
>
>
> I am not sure it will work for multiple drivers (app/JVM).  The only way
> you can find out is to do try it running two apps simultaneously. You have
> a number of tools.
>
>
>
>    1. use jps to see the apps and PID
>    2. use jmonitor to see memory/cpu/heap usage for each spark-submit job
>
> HTH
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 28 May 2016 at 17:41, Ted Yu <yu...@gmail.com> wrote:
>
>> Sujeet:
>>
>> Please also see:
>>
>> https://spark.apache.org/docs/latest/spark-standalone.html
>>
>> On Sat, May 28, 2016 at 9:19 AM, Mich Talebzadeh <
>> mich.talebzadeh@gmail.com> wrote:
>>
>>> Hi Sujeet,
>>>
>>> if you have a single machine then it is Spark standalone mode.
>>>
>>> In Standalone cluster mode Spark allocates resources based on cores. By
>>> default, an application will grab all the cores in the cluster.
>>>
>>> You only have one worker that lives within the driver JVM process that
>>> you start when you start the application with spark-shell or spark-submit
>>> in the host where the cluster manager is running.
>>>
>>> The Driver node runs on the same host that the cluster manager is
>>> running. The Driver requests the Cluster Manager for resources to run
>>> tasks. The worker is tasked to create the executor (in this case there is
>>> only one executor) for the Driver. The Executor runs tasks for the Driver.
>>> Only one executor can be allocated on each worker per application. In your
>>> case you only have
>>>
>>>
>>> The minimum you will need will be 2-4G of RAM and two cores. Well that
>>> is my experience. Yes you can submit more than one spark-submit (the
>>> driver) but they may queue up behind the running one if there is not enough
>>> resources.
>>>
>>>
>>> You pointed out that you will be running few applications in parallel on
>>> the same host. The likelihood is that you are using a VM machine for this
>>> purpose and the best option is to try running the first one, Check Web GUI
>>> on  4040 to see the progress of this Job. If you start the next JVM then
>>> assuming it is working, it will be using port 4041 and so forth.
>>>
>>>
>>> In actual fact try the command "free" to see how much free memory you
>>> have.
>>>
>>>
>>> HTH
>>>
>>>
>>>
>>>
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>>
>>> On 28 May 2016 at 16:42, sujeet jog <su...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I have a question w.r.t  production deployment mode of spark,
>>>>
>>>> I have 3 applications which i would like to run independently on a
>>>> single machine, i need to run the drivers in the same machine.
>>>>
>>>> The amount of resources i have is also limited, like 4- 5GB RAM , 3 - 4
>>>> cores.
>>>>
>>>> For deployment in standalone mode : i believe i need
>>>>
>>>> 1 Driver JVM,  1 worker node ( 1 executor )
>>>> 1 Driver JVM,  1 worker node ( 1 executor )
>>>> 1 Driver JVM,  1 worker node ( 1 executor )
>>>>
>>>> The issue here is i will require 6 JVM running in parallel, for which i
>>>> do not have sufficient CPU/MEM resources,
>>>>
>>>>
>>>> Hence i was looking more towards a local mode deployment mode, would
>>>> like to know if anybody is using local mode where Driver + Executor run in
>>>> a single JVM in production mode.
>>>>
>>>> Are there any inherent issues upfront using local mode for production
>>>> base systems.?..
>>>>
>>>>
>>>
>>
>

Re: local Vs Standalonecluster production deployment

Posted by Mich Talebzadeh <mi...@gmail.com>.

Ok so you want to run all this in local mode. In other words something like
below

${SPARK_HOME}/bin/spark-submit \

                --master local[2] \

                --driver-memory 2G \

                --num-executors=1 \

                --executor-memory=2G \

                --executor-cores=2 \


I am not sure it will work for multiple drivers (app/JVM).  The only way
you can find out is to do try it running two apps simultaneously. You have
a number of tools.



   1. use jps to see the apps and PID
   2. use jmonitor to see memory/cpu/heap usage for each spark-submit job

HTH

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 28 May 2016 at 17:41, Ted Yu <yu...@gmail.com> wrote:

> Sujeet:
>
> Please also see:
>
> https://spark.apache.org/docs/latest/spark-standalone.html
>
> On Sat, May 28, 2016 at 9:19 AM, Mich Talebzadeh <
> mich.talebzadeh@gmail.com> wrote:
>
>> Hi Sujeet,
>>
>> if you have a single machine then it is Spark standalone mode.
>>
>> In Standalone cluster mode Spark allocates resources based on cores. By
>> default, an application will grab all the cores in the cluster.
>>
>> You only have one worker that lives within the driver JVM process that
>> you start when you start the application with spark-shell or spark-submit
>> in the host where the cluster manager is running.
>>
>> The Driver node runs on the same host that the cluster manager is
>> running. The Driver requests the Cluster Manager for resources to run
>> tasks. The worker is tasked to create the executor (in this case there is
>> only one executor) for the Driver. The Executor runs tasks for the Driver.
>> Only one executor can be allocated on each worker per application. In your
>> case you only have
>>
>>
>> The minimum you will need will be 2-4G of RAM and two cores. Well that is
>> my experience. Yes you can submit more than one spark-submit (the driver)
>> but they may queue up behind the running one if there is not enough
>> resources.
>>
>>
>> You pointed out that you will be running few applications in parallel on
>> the same host. The likelihood is that you are using a VM machine for this
>> purpose and the best option is to try running the first one, Check Web GUI
>> on  4040 to see the progress of this Job. If you start the next JVM then
>> assuming it is working, it will be using port 4041 and so forth.
>>
>>
>> In actual fact try the command "free" to see how much free memory you
>> have.
>>
>>
>> HTH
>>
>>
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 28 May 2016 at 16:42, sujeet jog <su...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I have a question w.r.t  production deployment mode of spark,
>>>
>>> I have 3 applications which i would like to run independently on a
>>> single machine, i need to run the drivers in the same machine.
>>>
>>> The amount of resources i have is also limited, like 4- 5GB RAM , 3 - 4
>>> cores.
>>>
>>> For deployment in standalone mode : i believe i need
>>>
>>> 1 Driver JVM,  1 worker node ( 1 executor )
>>> 1 Driver JVM,  1 worker node ( 1 executor )
>>> 1 Driver JVM,  1 worker node ( 1 executor )
>>>
>>> The issue here is i will require 6 JVM running in parallel, for which i
>>> do not have sufficient CPU/MEM resources,
>>>
>>>
>>> Hence i was looking more towards a local mode deployment mode, would
>>> like to know if anybody is using local mode where Driver + Executor run in
>>> a single JVM in production mode.
>>>
>>> Are there any inherent issues upfront using local mode for production
>>> base systems.?..
>>>
>>>
>>
>

Re: local Vs Standalonecluster production deployment

Posted by Ted Yu <yu...@gmail.com>.

Sujeet:

Please also see:

https://spark.apache.org/docs/latest/spark-standalone.html

On Sat, May 28, 2016 at 9:19 AM, Mich Talebzadeh <mi...@gmail.com>
wrote:

> Hi Sujeet,
>
> if you have a single machine then it is Spark standalone mode.
>
> In Standalone cluster mode Spark allocates resources based on cores. By
> default, an application will grab all the cores in the cluster.
>
> You only have one worker that lives within the driver JVM process that you
> start when you start the application with spark-shell or spark-submit in
> the host where the cluster manager is running.
>
> The Driver node runs on the same host that the cluster manager is running.
> The Driver requests the Cluster Manager for resources to run tasks. The
> worker is tasked to create the executor (in this case there is only one
> executor) for the Driver. The Executor runs tasks for the Driver. Only one
> executor can be allocated on each worker per application. In your case you
> only have
>
>
> The minimum you will need will be 2-4G of RAM and two cores. Well that is
> my experience. Yes you can submit more than one spark-submit (the driver)
> but they may queue up behind the running one if there is not enough
> resources.
>
>
> You pointed out that you will be running few applications in parallel on
> the same host. The likelihood is that you are using a VM machine for this
> purpose and the best option is to try running the first one, Check Web GUI
> on  4040 to see the progress of this Job. If you start the next JVM then
> assuming it is working, it will be using port 4041 and so forth.
>
>
> In actual fact try the command "free" to see how much free memory you have.
>
>
> HTH
>
>
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 28 May 2016 at 16:42, sujeet jog <su...@gmail.com> wrote:
>
>> Hi,
>>
>> I have a question w.r.t  production deployment mode of spark,
>>
>> I have 3 applications which i would like to run independently on a single
>> machine, i need to run the drivers in the same machine.
>>
>> The amount of resources i have is also limited, like 4- 5GB RAM , 3 - 4
>> cores.
>>
>> For deployment in standalone mode : i believe i need
>>
>> 1 Driver JVM,  1 worker node ( 1 executor )
>> 1 Driver JVM,  1 worker node ( 1 executor )
>> 1 Driver JVM,  1 worker node ( 1 executor )
>>
>> The issue here is i will require 6 JVM running in parallel, for which i
>> do not have sufficient CPU/MEM resources,
>>
>>
>> Hence i was looking more towards a local mode deployment mode, would like
>> to know if anybody is using local mode where Driver + Executor run in a
>> single JVM in production mode.
>>
>> Are there any inherent issues upfront using local mode for production
>> base systems.?..
>>
>>
>

Re: local Vs Standalonecluster production deployment

Posted by Mich Talebzadeh <mi...@gmail.com>.

Hi Sujeet,

if you have a single machine then it is Spark standalone mode.

In Standalone cluster mode Spark allocates resources based on cores. By
default, an application will grab all the cores in the cluster.

You only have one worker that lives within the driver JVM process that you
start when you start the application with spark-shell or spark-submit in
the host where the cluster manager is running.

The Driver node runs on the same host that the cluster manager is running.
The Driver requests the Cluster Manager for resources to run tasks. The
worker is tasked to create the executor (in this case there is only one
executor) for the Driver. The Executor runs tasks for the Driver. Only one
executor can be allocated on each worker per application. In your case you
only have

The minimum you will need will be 2-4G of RAM and two cores. Well that is
my experience. Yes you can submit more than one spark-submit (the driver)
but they may queue up behind the running one if there is not enough
resources.

You pointed out that you will be running few applications in parallel on
the same host. The likelihood is that you are using a VM machine for this
purpose and the best option is to try running the first one, Check Web GUI
on  4040 to see the progress of this Job. If you start the next JVM then
assuming it is working, it will be using port 4041 and so forth.

In actual fact try the command "free" to see how much free memory you have.

HTH

Dr Mich Talebzadeh

LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*

http://talebzadehmich.wordpress.com

On 28 May 2016 at 16:42, sujeet jog <su...@gmail.com> wrote:

> Hi,
>
> I have a question w.r.t  production deployment mode of spark,
>
> I have 3 applications which i would like to run independently on a single
> machine, i need to run the drivers in the same machine.
>
> The amount of resources i have is also limited, like 4- 5GB RAM , 3 - 4
> cores.
>
> For deployment in standalone mode : i believe i need
>
> 1 Driver JVM,  1 worker node ( 1 executor )
> 1 Driver JVM,  1 worker node ( 1 executor )
> 1 Driver JVM,  1 worker node ( 1 executor )
>
> The issue here is i will require 6 JVM running in parallel, for which i do
> not have sufficient CPU/MEM resources,
>
>
> Hence i was looking more towards a local mode deployment mode, would like
> to know if anybody is using local mode where Driver + Executor run in a
> single JVM in production mode.
>
> Are there any inherent issues upfront using local mode for production base
> systems.?..
>
>