You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by "Livni, Dana" <da...@intel.com> on 2014/02/23 21:57:04 UTC

High CPU usage

Hi all,

We have written an application which uses spark over Hbase.
We are using YARN as the resource manager.
Currently each time the application run it raise spark context, run few mappers and then finishes.
In each run, no matter what the amount of data it run on, we can see a spike in CPU usage on the cluster nodes.
For no data at all the spike can be around 15%, and large amount of data can reach to even to 30% CPU usage.

We expected to see a change in the memory usage but not in the CPU usage.
Is this the normal behavior? Is it caused of only raising the spark context?
Or maybe we are doing something wrong?

The main problem we have with this situation is when we need to run multiple instances of the application.
Since in this case the CPU of the nodes spikes constantly to 100%, and the run of all the application get very slow.

Thanks
Dana.


---------------------------------------------------------------------
Intel Electronics Ltd.

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

Re: High CPU usage

Posted by Mayur Rustagi <ma...@gmail.com>.
is the cause of the spike the driver node? (its a highly likely candidate).
Then you can shift driver nodes off the master & even to other slave nodes.


Mayur Rustagi
Ph: +919632149971
h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com
https://twitter.com/mayur_rustagi



On Sun, Feb 23, 2014 at 12:57 PM, Livni, Dana <da...@intel.com> wrote:

>  Hi all,
>
>
>
> We have written an application which uses spark over Hbase.
>
> We are using YARN as the resource manager.
>
> Currently each time the application run it raise spark context, run few
> mappers and then finishes.
>
> In each run, no matter what the amount of data it run on, we can see a
> spike in CPU usage on the cluster nodes.
>
> For no data at all the spike can be around 15%, and large amount of data
> can reach to even to 30% CPU usage.
>
>
>
> We expected to see a change in the memory usage but not in the CPU usage.
>
> Is this the normal behavior? Is it caused of only raising the spark
> context?
>
> Or maybe we are doing something wrong?
>
>
>
> The main problem we have with this situation is when we need to run
> multiple instances of the application.
>
> Since in this case the CPU of the nodes spikes constantly to 100%, and the
> run of all the application get very slow.
>
>
>
> Thanks
>
> Dana.
>
>
>
>
>
> ---------------------------------------------------------------------
> Intel Electronics Ltd.
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>