You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Mayur Rustagi <ma...@gmail.com> on 2014/07/02 08:40:03 UTC

Re: Help alleviating OOM errors

Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi <https://twitter.com/mayur_rustagi>



On Mon, Jun 30, 2014 at 8:09 PM, Yana Kadiyska <ya...@gmail.com>
wrote:

> Hi,
>
> our cluster seems to have a really hard time with OOM errors on the
> executor. Periodically we'd see a task that gets sent to a few
> executors, one would OOM, and then the job just stays active for hours
> (sometimes 30+ whereas normally it completes sub-minute).
>
> So I have a few questions:
>
> 1. Why am I seeing OOMs to begin with?
>
> I'm running with defaults for
> spark.storage.memoryFraction
> spark.shuffle.memoryFraction
>
> so my understanding is that if Spark exceeds 60% of available memory,
> data will be spilled to disk? Am I misunderstanding this? In the
> attached screenshot, I see a single stage with 2 tasks on the same
> executor -- no disk spills but OOM.
>
You need to configure the  spark.shuffle.spill true again in the config,
What is causing you to OOM, it could be that you are trying to just simply
sortbykey & keys are bigger memory of executor causing the OOM, can you put
the stack.

>
> 2. How can I reduce the likelyhood of seeing OOMs -- I am a bit
> concerned that I don't see a spill at all so not sure if decreasing
> spark.storage.memoryFraction is what needs to be done
>


>
> 3. Why does an OOM seem to break the executor so hopelessly? I am
> seeing times upwards of 30hrs once an OOM occurs. Why is that -- the
> task *should* take under a minute, so even if the whole RDD was
> recomputed from scratch, 30hrs is very mysterious to me. Hadoop can
> process this in about 10-15 minutes, so I imagine even if the whole
> job went to disk it should still not take more than an hour
>
When OOM occurs it could cause the RDD to spill to disk, the repeat task
may be forced to read data from disk & cause the overall slowdown, not to
mention the RDD may be send to different executor to be processed, are you
seeing the slow tasks as process_local  or node_local atleast?

>
> Any insight into this would be much appreciated.
> Running Spark 0.9.1
>

Re: Help alleviating OOM errors

Posted by Andrew Or <an...@databricks.com>.

Hi Yana,

In 0.9.1, spark.shuffle.spill is set to true by default so you shouldn't
need to manually set it.

Here are a few common causes of OOMs in Spark:

- Too few partitions: if one partition is too big, it may cause an OOM if
there is not enough space to unroll the entire partition in memory. This
will be fixed in Spark 1.1, but until then you can try to increase the
number of partitions so each core on each executor has less data to handle.

- Your application is using a lot of memory on its own: Spark be default
assumes that it has 90% of the runtime memory in your JVM. If your
application is super memory-intensive (e.g. creates large data structures),
then I would either try to reduce the memory footprint of your application
itself if possible, or reduce the amount of memory Spark thinks it owns.
For the latter, I would reduce spark.storage.memoryFraction.

To answer your questions:

- Yes, Spark will spill your RDD partitions to disk if they exceed 60% of
your runtime memory, ONLY if you persisted with MEMORY_AND_DISK, however.
The default is MEMORY_ONLY, which means they RDDs will just be kicked out
of the cache if your cache is full.

- In general, if an OOM occurs it would be good to re-run your application
after making the changes to your application suggested above. If one
executor dies because of an OOM, another executor might also run into it.
Even if the original task that caused an OOM is re-scheduled on a different
executor, it is likely that the other executor will also die of the same
problem.

Best,
Andrew


2014-07-02 6:22 GMT-07:00 Yana Kadiyska <ya...@gmail.com>:

> Can you elaborate why "You need to configure the  spark.shuffle.spill
> true again in the config" -- the default for spark.shuffle.spill is
> set to true according to the
> doc(https://spark.apache.org/docs/0.9.1/configuration.html)?
>
> On OOM the tasks were process_local, which I understand is "as good as
> it gets" but still going on 32+ hours.
>
> On Wed, Jul 2, 2014 at 2:40 AM, Mayur Rustagi <ma...@gmail.com>
> wrote:
> >
> >
> > Mayur Rustagi
> > Ph: +1 (760) 203 3257
> > http://www.sigmoidanalytics.com
> > @mayur_rustagi
> >
> >
> >
> > On Mon, Jun 30, 2014 at 8:09 PM, Yana Kadiyska <ya...@gmail.com>
> > wrote:
> >>
> >> Hi,
> >>
> >> our cluster seems to have a really hard time with OOM errors on the
> >> executor. Periodically we'd see a task that gets sent to a few
> >> executors, one would OOM, and then the job just stays active for hours
> >> (sometimes 30+ whereas normally it completes sub-minute).
> >>
> >> So I have a few questions:
> >>
> >> 1. Why am I seeing OOMs to begin with?
> >>
> >> I'm running with defaults for
> >> spark.storage.memoryFraction
> >> spark.shuffle.memoryFraction
> >>
> >> so my understanding is that if Spark exceeds 60% of available memory,
> >> data will be spilled to disk? Am I misunderstanding this? In the
> >> attached screenshot, I see a single stage with 2 tasks on the same
> >> executor -- no disk spills but OOM.
> >
> > You need to configure the  spark.shuffle.spill true again in the config,
> > What is causing you to OOM, it could be that you are trying to just
> simply
> > sortbykey & keys are bigger memory of executor causing the OOM, can you
> put
> > the stack.
> >>
> >>
> >> 2. How can I reduce the likelyhood of seeing OOMs -- I am a bit
> >> concerned that I don't see a spill at all so not sure if decreasing
> >> spark.storage.memoryFraction is what needs to be done
> >
> >
> >>
> >>
> >> 3. Why does an OOM seem to break the executor so hopelessly? I am
> >> seeing times upwards of 30hrs once an OOM occurs. Why is that -- the
> >> task *should* take under a minute, so even if the whole RDD was
> >> recomputed from scratch, 30hrs is very mysterious to me. Hadoop can
> >> process this in about 10-15 minutes, so I imagine even if the whole
> >> job went to disk it should still not take more than an hour
> >
> > When OOM occurs it could cause the RDD to spill to disk, the repeat task
> may
> > be forced to read data from disk & cause the overall slowdown, not to
> > mention the RDD may be send to different executor to be processed, are
> you
> > seeing the slow tasks as process_local  or node_local atleast?
> >>
> >>
> >> Any insight into this would be much appreciated.
> >> Running Spark 0.9.1
> >
> >
>

Re: Help alleviating OOM errors

Posted by Yana Kadiyska <ya...@gmail.com>.

Can you elaborate why "You need to configure the  spark.shuffle.spill
true again in the config" -- the default for spark.shuffle.spill is
set to true according to the
doc(https://spark.apache.org/docs/0.9.1/configuration.html)?

On OOM the tasks were process_local, which I understand is "as good as
it gets" but still going on 32+ hours.

On Wed, Jul 2, 2014 at 2:40 AM, Mayur Rustagi <ma...@gmail.com> wrote:
>
>
> Mayur Rustagi
> Ph: +1 (760) 203 3257
> http://www.sigmoidanalytics.com
> @mayur_rustagi
>
>
>
> On Mon, Jun 30, 2014 at 8:09 PM, Yana Kadiyska <ya...@gmail.com>
> wrote:
>>
>> Hi,
>>
>> our cluster seems to have a really hard time with OOM errors on the
>> executor. Periodically we'd see a task that gets sent to a few
>> executors, one would OOM, and then the job just stays active for hours
>> (sometimes 30+ whereas normally it completes sub-minute).
>>
>> So I have a few questions:
>>
>> 1. Why am I seeing OOMs to begin with?
>>
>> I'm running with defaults for
>> spark.storage.memoryFraction
>> spark.shuffle.memoryFraction
>>
>> so my understanding is that if Spark exceeds 60% of available memory,
>> data will be spilled to disk? Am I misunderstanding this? In the
>> attached screenshot, I see a single stage with 2 tasks on the same
>> executor -- no disk spills but OOM.
>
> You need to configure the  spark.shuffle.spill true again in the config,
> What is causing you to OOM, it could be that you are trying to just simply
> sortbykey & keys are bigger memory of executor causing the OOM, can you put
> the stack.
>>
>>
>> 2. How can I reduce the likelyhood of seeing OOMs -- I am a bit
>> concerned that I don't see a spill at all so not sure if decreasing
>> spark.storage.memoryFraction is what needs to be done
>
>
>>
>>
>> 3. Why does an OOM seem to break the executor so hopelessly? I am
>> seeing times upwards of 30hrs once an OOM occurs. Why is that -- the
>> task *should* take under a minute, so even if the whole RDD was
>> recomputed from scratch, 30hrs is very mysterious to me. Hadoop can
>> process this in about 10-15 minutes, so I imagine even if the whole
>> job went to disk it should still not take more than an hour
>
> When OOM occurs it could cause the RDD to spill to disk, the repeat task may
> be forced to read data from disk & cause the overall slowdown, not to
> mention the RDD may be send to different executor to be processed, are you
> seeing the slow tasks as process_local  or node_local atleast?
>>
>>
>> Any insight into this would be much appreciated.
>> Running Spark 0.9.1
>
>