You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by on <sc...@web.de> on 2016/07/25 18:19:44 UTC

Re: Performance tuning for local mode on one host

OK, sorry, I am running in local mode.
Just a very small setup...

(changed the subject)

On 25.07.2016 20:01, Mich Talebzadeh wrote:
> Hi,
>
> From your reference I can see that you are running in local mode with
> two cores. But that is not standalone.
>
> Can you please clarify whether you start master and slaves processes.
> Those are for standalone mode.
>
> sbin/start-master.sh
> sbin/start-slaves.sh
>
> HTH
>
> Dr Mich Talebzadeh
>
>  
>
> LinkedIn
> / https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw/
>
>  
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk.Any and all responsibility for
> any loss, damage or destruction of data or any other property which
> may arise from relying on this email's technical content is explicitly
> disclaimed. The author will in no case be liable for any monetary
> damages arising from such loss, damage or destruction.
>
>  
>
>
> On 25 July 2016 at 18:21, on <schueler_1234@web.de
> <ma...@web.de>> wrote:
>
>     Dear all,
>
>     I am running spark on one host ("local[2]") doing calculations
>     like this
>     on a socket stream.
>     mainStream = socketStream.filter(lambda msg:
>     msg['header'].startswith('test')).map(lambda x: (x['host'], x) )
>     s1 = mainStream.updateStateByKey(updateFirst).map(lambda x: (1, x) )
>     s2 = mainStream.updateStateByKey(updateSecond,
>     initialRDD=initialMachineStates).map(lambda x: (2, x) )
>     out.join(bla2).foreachRDD(no_out)
>
>     I evaluated each calculations allone has a processing time about 400ms
>     but processing time of the code above is over 3 sec on average.
>
>     I know there are a lot of parameters unknown but does anybody has
>     hints
>     how to tune this code / system? I already changed a lot of parameters,
>     such as #executors, #cores and so on.
>
>     Thanks in advance and best regards,
>     on
>
>     ---------------------------------------------------------------------
>     To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>     <ma...@spark.apache.org>
>
>


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Re: Performance tuning for local mode on one host

Posted by Mich Talebzadeh <mi...@gmail.com>.

Hi On,

When you run in Spark mode there is only one SparkSubmit with one executor
only. How many cores do you have?

Each core will allow the same code to run concurrently so with local{8} you
will have 8 tasks running the same code on subset of your data

So do

cat /proc/cpuinfo|grep processor|wc -l


and determine how many Logical processors AKA cores you see


HTH




Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 25 July 2016 at 19:19, on <sc...@web.de> wrote:

>
> OK, sorry, I am running in local mode.
> Just a very small setup...
>
> (changed the subject)
>
> On 25.07.2016 20:01, Mich Talebzadeh wrote:
> > Hi,
> >
> > From your reference I can see that you are running in local mode with
> > two cores. But that is not standalone.
> >
> > Can you please clarify whether you start master and slaves processes.
> > Those are for standalone mode.
> >
> > sbin/start-master.sh
> > sbin/start-slaves.sh
> >
> > HTH
> >
> > Dr Mich Talebzadeh
> >
> >
> >
> > LinkedIn
> > /
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw/
> >
> >
> >
> > http://talebzadehmich.wordpress.com
> >
> >
> > *Disclaimer:* Use it at your own risk.Any and all responsibility for
> > any loss, damage or destruction of data or any other property which
> > may arise from relying on this email's technical content is explicitly
> > disclaimed. The author will in no case be liable for any monetary
> > damages arising from such loss, damage or destruction.
> >
> >
> >
> >
> > On 25 July 2016 at 18:21, on <schueler_1234@web.de
> > <ma...@web.de>> wrote:
> >
> >     Dear all,
> >
> >     I am running spark on one host ("local[2]") doing calculations
> >     like this
> >     on a socket stream.
> >     mainStream = socketStream.filter(lambda msg:
> >     msg['header'].startswith('test')).map(lambda x: (x['host'], x) )
> >     s1 = mainStream.updateStateByKey(updateFirst).map(lambda x: (1, x) )
> >     s2 = mainStream.updateStateByKey(updateSecond,
> >     initialRDD=initialMachineStates).map(lambda x: (2, x) )
> >     out.join(bla2).foreachRDD(no_out)
> >
> >     I evaluated each calculations allone has a processing time about
> 400ms
> >     but processing time of the code above is over 3 sec on average.
> >
> >     I know there are a lot of parameters unknown but does anybody has
> >     hints
> >     how to tune this code / system? I already changed a lot of
> parameters,
> >     such as #executors, #cores and so on.
> >
> >     Thanks in advance and best regards,
> >     on
> >
> >     ---------------------------------------------------------------------
> >     To unsubscribe e-mail: user-unsubscribe@spark.apache.org
> >     <ma...@spark.apache.org>
> >
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>