You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Aniket Bhatnagar <an...@gmail.com> on 2016/11/24 16:16:57 UTC

OS killing Executor due to high (possibly off heap) memory usage

Hi Spark users

I am running a job that does join of a huge dataset (7 TB+) and the
executors keep crashing randomly, eventually causing the job to crash.
There are no out of memory exceptions in the log and looking at the dmesg
output, it seems like the OS killed the JVM because of high memory usage.
My suspicion is towards off heap usage of executor is causing this as I am
limiting the on heap usage of executor to be 46 GB and each host running
the executor has 60 GB of RAM. After the executor crashes, I can see that
the external shuffle manager
(org.apache.spark.network.server.TransportRequestHandler) logs a lot of
channel closed exceptions in yarn node manager logs. This leads me to
believe that something triggers out of memory during shuffle read. Is there
a configuration to completely disable usage of off heap memory? I have
tried setting spark.shuffle.io.preferDirectBufs=false but the executor is
still getting killed by the same error.

Cluster details:
10 AWS c4.8xlarge hosts
RAM on each host - 60 GB
Number of cores on each host - 36
Additional hard disk on each host - 8 TB

Spark configuration:
dynamic allocation enabled
external shuffle service enabled
spark.driver.memory 1024M
spark.executor.memory 47127M
Spark master yarn-cluster

Sample error in yarn node manager:
2016-11-24 10:34:06,507 ERROR
org.apache.spark.network.server.TransportRequestHandler
(shuffle-server-50): Error sending result
ChunkFetchSuccess{streamChunkId=StreamChunkId{streamId=919299554123,
chunkIndex=0},
buffer=FileSegmentManagedBuffer{file=/mnt3/yarn/usercache/hadoop/appcache/application_1479898345621_0006/blockmgr-ad5301a9-e1e9-4723-a8c4-9276971b2259/2c/shuffle_3_963_0.data,
offset=0, length=669014456}} to /10.192.108.170:52782; closing connection
java.nio.channels.ClosedChannelException

Error in dmesg:
[799873.309897] Out of memory: Kill process 50001 (java) score 927 or
sacrifice child
[799873.314439] Killed process 50001 (java) total-vm:65652448kB,
anon-rss:57246528kB, file-rss:0kB

Thanks,
Aniket

Re: OS killing Executor due to high (possibly off heap) memory usage

Posted by Koert Kuipers <ko...@tresata.com>.

it would be great if this offheap memory usage becomes more predictable
again.
currently i see users put memoryOverhead to many gigabytes, sometimes as
much as executor memory. it is trial and error to find out what the right
number is. so people dont bother and put in huge numbers instead.

On Thu, Dec 8, 2016 at 11:53 AM, Aniket Bhatnagar <
aniket.bhatnagar@gmail.com> wrote:

> I did some instrumentation to figure out traces of where DirectByteBuffers
> are being created and it turns out that setting the following system
> properties in addition to setting spark.shuffle.io.preferDirectBufs=false
> in spark config:
>
> io.netty.noUnsafe=true
> io.netty.threadLocalDirectBufferSize=0
>
> This should force netty to mostly use on heap buffers and thus increases
> the stability of spark jobs that perform a lot of shuffle. I have created
> the defect SPARK-18787 to either force these settings when
> spark.shuffle.io.preferDirectBufs=false is set in spark config or
> document it.
>
> Hope it will be helpful for other users as well.
>
> Thanks,
> Aniket
>
> On Sat, Nov 26, 2016 at 3:31 PM Koert Kuipers <ko...@tresata.com> wrote:
>
>> i agree that offheap memory usage is unpredictable.
>>
>> when we used rdds the memory was mostly on heap and total usage
>> predictable, and we almost never had yarn killing executors.
>>
>> now with dataframes the memory usage is both on and off heap, and we have
>> no way of limiting the off heap memory usage by spark, yet yarn requires a
>> maximum total memory usage and if you go over it yarn kills the executor.
>>
>> On Fri, Nov 25, 2016 at 12:14 PM, Aniket Bhatnagar <
>> aniket.bhatnagar@gmail.com> wrote:
>>
>> Thanks Rohit, Roddick and Shreya. I tried changing spark.yarn.executor.memoryOverhead
>> to be 10GB and lowering executor memory to 30 GB and both of these didn't
>> work. I finally had to reduce the number of cores per executor to be 18
>> (from 36) in addition to setting higher spark.yarn.executor.memoryOverhead
>> and lower executor memory size. I had to trade off performance for
>> reliability.
>>
>> Unfortunately, spark does a poor job reporting off heap memory usage.
>> From the profiler, it seems that the job's heap usage is pretty static but
>> the off heap memory fluctuates quiet a lot. It looks like bulk of off heap
>> is used by io.netty.buffer.UnpooledUnsafeDirectByteBuf while the shuffle
>> client is trying to read block from shuffle service. It looks
>> like org.apache.spark.network.util.TransportFrameDecoder retains them
>> in buffers field while decoding responses from the shuffle service. So far,
>> it's not clear why it needs to hold multiple GBs in the buffers. Perhaps
>> increasing the number of partitions may help with this.
>>
>> Thanks,
>> Aniket
>>
>> On Fri, Nov 25, 2016 at 1:09 AM Shreya Agarwal <sh...@microsoft.com>
>> wrote:
>>
>> I don’t think it’s just memory overhead. It might be better to use an
>> execute with lesser heap space(30GB?). 46 GB would mean more data load into
>> memory and more GC, which can cause issues.
>>
>>
>>
>> Also, have you tried to persist data in any way? If so, then that might
>> be causing an issue.
>>
>>
>>
>> Lastly, I am not sure if your data has a skew and if that is forcing a
>> lot of data to be on one executor node.
>>
>>
>>
>> Sent from my Windows 10 phone
>>
>>
>>
>> *From: *Rodrick Brown <ro...@orchardplatform.com>
>> *Sent: *Friday, November 25, 2016 12:25 AM
>> *To: *Aniket Bhatnagar <an...@gmail.com>
>> *Cc: *user <us...@spark.apache.org>
>> *Subject: *Re: OS killing Executor due to high (possibly off heap)
>> memory usage
>>
>>
>> Try setting spark.yarn.executor.memoryOverhead 10000
>>
>> On Thu, Nov 24, 2016 at 11:16 AM, Aniket Bhatnagar <
>> aniket.bhatnagar@gmail.com> wrote:
>>
>> Hi Spark users
>>
>> I am running a job that does join of a huge dataset (7 TB+) and the
>> executors keep crashing randomly, eventually causing the job to crash.
>> There are no out of memory exceptions in the log and looking at the dmesg
>> output, it seems like the OS killed the JVM because of high memory usage.
>> My suspicion is towards off heap usage of executor is causing this as I am
>> limiting the on heap usage of executor to be 46 GB and each host running
>> the executor has 60 GB of RAM. After the executor crashes, I can see that
>> the external shuffle manager (org.apache.spark.network.server.TransportRequestHandler)
>> logs a lot of channel closed exceptions in yarn node manager logs. This
>> leads me to believe that something triggers out of memory during shuffle
>> read. Is there a configuration to completely disable usage of off heap
>> memory? I have tried setting spark.shuffle.io.preferDirectBufs=false but
>> the executor is still getting killed by the same error.
>>
>> Cluster details:
>> 10 AWS c4.8xlarge hosts
>> RAM on each host - 60 GB
>> Number of cores on each host - 36
>> Additional hard disk on each host - 8 TB
>>
>> Spark configuration:
>> dynamic allocation enabled
>> external shuffle service enabled
>> spark.driver.memory 1024M
>> spark.executor.memory 47127M
>> Spark master yarn-cluster
>>
>> Sample error in yarn node manager:
>> 2016-11-24 10:34:06,507 ERROR org.apache.spark.network.server.TransportRequestHandler
>> (shuffle-server-50): Error sending result ChunkFetchSuccess{
>> streamChunkId=StreamChunkId{streamId=919299554123, chunkIndex=0}, buffer=
>> FileSegmentManagedBuffer{file=/mnt3/yarn/usercache/hadoop/
>> appcache/application_1479898345621_0006/blockmgr-ad5301a9-e1e9-4723-a8c4-
>> 9276971b2259/2c/shuffle_3_963_0.data, offset=0, length=669014456}} to /
>> 10.192.108.170:52782; closing connection
>> java.nio.channels.ClosedChannelException
>>
>> Error in dmesg:
>> [799873.309897] Out of memory: Kill process 50001 (java) score 927 or
>> sacrifice child
>> [799873.314439] Killed process 50001 (java) total-vm:65652448kB,
>> anon-rss:57246528kB, file-rss:0kB
>>
>> Thanks,
>> Aniket
>>
>>
>>
>>
>> --
>>
>> [image: Orchard Platform] <http://www.orchardplatform.com/>
>>
>> *Rodrick Brown */ *DevOPs*
>>
>> 9174456839 / rodrick@orchardplatform.com
>>
>> Orchard Platform
>> 101 5th Avenue, 4th Floor, New York, NY
>>
>> *NOTICE TO RECIPIENTS*: This communication is confidential and intended
>> for the use of the addressee only. If you are not an intended recipient of
>> this communication, please delete it immediately and notify the sender
>> by return email. Unauthorized reading, dissemination, distribution or
>> copying of this communication is prohibited. This communication does not constitute
>> an offer to sell or a solicitation of an indication of interest to purchase
>> any loan, security or any other financial product or instrument, nor is it
>> an offer to sell or a solicitation of an indication of interest to purchase
>> any products or services to any persons who are prohibited from receiving
>> such information under applicable law. The contents of this communication
>> may not be accurate or complete and are subject to change without notice.
>> As such, Orchard App, Inc. (including its subsidiaries and affiliates,
>> "Orchard") makes no representation regarding the accuracy or
>> completeness of the information contained herein. The intended recipient is
>> advised to consult its own professional advisors, including those
>> specializing in legal, tax and accounting matters. Orchard does not
>> provide legal, tax or accounting advice.
>>
>>
>>

Re: OS killing Executor due to high (possibly off heap) memory usage

Posted by Aniket Bhatnagar <an...@gmail.com>.

I did some instrumentation to figure out traces of where DirectByteBuffers
are being created and it turns out that setting the following system
properties in addition to setting spark.shuffle.io.preferDirectBufs=false
in spark config:

io.netty.noUnsafe=true
io.netty.threadLocalDirectBufferSize=0

This should force netty to mostly use on heap buffers and thus increases
the stability of spark jobs that perform a lot of shuffle. I have created
the defect SPARK-18787 to either force these settings when
spark.shuffle.io.preferDirectBufs=false is set in spark config or document
it.

Hope it will be helpful for other users as well.

Thanks,
Aniket

On Sat, Nov 26, 2016 at 3:31 PM Koert Kuipers <ko...@tresata.com> wrote:

> i agree that offheap memory usage is unpredictable.
>
> when we used rdds the memory was mostly on heap and total usage
> predictable, and we almost never had yarn killing executors.
>
> now with dataframes the memory usage is both on and off heap, and we have
> no way of limiting the off heap memory usage by spark, yet yarn requires a
> maximum total memory usage and if you go over it yarn kills the executor.
>
> On Fri, Nov 25, 2016 at 12:14 PM, Aniket Bhatnagar <
> aniket.bhatnagar@gmail.com> wrote:
>
> Thanks Rohit, Roddick and Shreya. I tried
> changing spark.yarn.executor.memoryOverhead to be 10GB and lowering
> executor memory to 30 GB and both of these didn't work. I finally had to
> reduce the number of cores per executor to be 18 (from 36) in addition to
> setting higher spark.yarn.executor.memoryOverhead and lower executor memory
> size. I had to trade off performance for reliability.
>
> Unfortunately, spark does a poor job reporting off heap memory usage. From
> the profiler, it seems that the job's heap usage is pretty static but the
> off heap memory fluctuates quiet a lot. It looks like bulk of off heap is
> used by io.netty.buffer.UnpooledUnsafeDirectByteBuf while the shuffle
> client is trying to read block from shuffle service. It looks
> like org.apache.spark.network.util.TransportFrameDecoder retains them
> in buffers field while decoding responses from the shuffle service. So far,
> it's not clear why it needs to hold multiple GBs in the buffers. Perhaps
> increasing the number of partitions may help with this.
>
> Thanks,
> Aniket
>
> On Fri, Nov 25, 2016 at 1:09 AM Shreya Agarwal <sh...@microsoft.com>
> wrote:
>
> I don’t think it’s just memory overhead. It might be better to use an
> execute with lesser heap space(30GB?). 46 GB would mean more data load into
> memory and more GC, which can cause issues.
>
>
>
> Also, have you tried to persist data in any way? If so, then that might be
> causing an issue.
>
>
>
> Lastly, I am not sure if your data has a skew and if that is forcing a lot
> of data to be on one executor node.
>
>
>
> Sent from my Windows 10 phone
>
>
>
> *From: *Rodrick Brown <ro...@orchardplatform.com>
> *Sent: *Friday, November 25, 2016 12:25 AM
> *To: *Aniket Bhatnagar <an...@gmail.com>
> *Cc: *user <us...@spark.apache.org>
> *Subject: *Re: OS killing Executor due to high (possibly off heap) memory
> usage
>
>
> Try setting spark.yarn.executor.memoryOverhead 10000
>
> On Thu, Nov 24, 2016 at 11:16 AM, Aniket Bhatnagar <
> aniket.bhatnagar@gmail.com> wrote:
>
> Hi Spark users
>
> I am running a job that does join of a huge dataset (7 TB+) and the
> executors keep crashing randomly, eventually causing the job to crash.
> There are no out of memory exceptions in the log and looking at the dmesg
> output, it seems like the OS killed the JVM because of high memory usage.
> My suspicion is towards off heap usage of executor is causing this as I am
> limiting the on heap usage of executor to be 46 GB and each host running
> the executor has 60 GB of RAM. After the executor crashes, I can see that
> the external shuffle manager
> (org.apache.spark.network.server.TransportRequestHandler) logs a lot of
> channel closed exceptions in yarn node manager logs. This leads me to
> believe that something triggers out of memory during shuffle read. Is there
> a configuration to completely disable usage of off heap memory? I have
> tried setting spark.shuffle.io.preferDirectBufs=false but the executor is
> still getting killed by the same error.
>
> Cluster details:
> 10 AWS c4.8xlarge hosts
> RAM on each host - 60 GB
> Number of cores on each host - 36
> Additional hard disk on each host - 8 TB
>
> Spark configuration:
> dynamic allocation enabled
> external shuffle service enabled
> spark.driver.memory 1024M
> spark.executor.memory 47127M
> Spark master yarn-cluster
>
> Sample error in yarn node manager:
> 2016-11-24 10:34:06,507 ERROR
> org.apache.spark.network.server.TransportRequestHandler
> (shuffle-server-50): Error sending result
> ChunkFetchSuccess{streamChunkId=StreamChunkId{streamId=919299554123,
> chunkIndex=0},
> buffer=FileSegmentManagedBuffer{file=/mnt3/yarn/usercache/hadoop/appcache/application_1479898345621_0006/blockmgr-ad5301a9-e1e9-4723-a8c4-9276971b2259/2c/shuffle_3_963_0.data,
> offset=0, length=669014456}} to /10.192.108.170:52782; closing connection
> java.nio.channels.ClosedChannelException
>
> Error in dmesg:
> [799873.309897] Out of memory: Kill process 50001 (java) score 927 or
> sacrifice child
> [799873.314439] Killed process 50001 (java) total-vm:65652448kB,
> anon-rss:57246528kB, file-rss:0kB
>
> Thanks,
> Aniket
>
>
>
>
> --
>
> [image: Orchard Platform] <http://www.orchardplatform.com/>
>
> *Rodrick Brown */ *DevOPs*
>
> 9174456839 / rodrick@orchardplatform.com
>
> Orchard Platform
> 101 5th Avenue, 4th Floor, New York, NY
>
> *NOTICE TO RECIPIENTS*: This communication is confidential and intended
> for the use of the addressee only. If you are not an intended recipient of
> this communication, please delete it immediately and notify the sender by
> return email. Unauthorized reading, dissemination, distribution or copying
> of this communication is prohibited. This communication does not constitute
> an offer to sell or a solicitation of an indication of interest to purchase
> any loan, security or any other financial product or instrument, nor is it
> an offer to sell or a solicitation of an indication of interest to purchase
> any products or services to any persons who are prohibited from receiving
> such information under applicable law. The contents of this communication
> may not be accurate or complete and are subject to change without notice.
> As such, Orchard App, Inc. (including its subsidiaries and affiliates,
> "Orchard") makes no representation regarding the accuracy or completeness
> of the information contained herein. The intended recipient is advised to
> consult its own professional advisors, including those specializing in
> legal, tax and accounting matters. Orchard does not provide legal, tax or
> accounting advice.
>
>
>

Re: OS killing Executor due to high (possibly off heap) memory usage

Posted by Koert Kuipers <ko...@tresata.com>.

i agree that offheap memory usage is unpredictable.

when we used rdds the memory was mostly on heap and total usage
predictable, and we almost never had yarn killing executors.

now with dataframes the memory usage is both on and off heap, and we have
no way of limiting the off heap memory usage by spark, yet yarn requires a
maximum total memory usage and if you go over it yarn kills the executor.

On Fri, Nov 25, 2016 at 12:14 PM, Aniket Bhatnagar <
aniket.bhatnagar@gmail.com> wrote:

> Thanks Rohit, Roddick and Shreya. I tried changing spark.yarn.executor.memoryOverhead
> to be 10GB and lowering executor memory to 30 GB and both of these didn't
> work. I finally had to reduce the number of cores per executor to be 18
> (from 36) in addition to setting higher spark.yarn.executor.memoryOverhead
> and lower executor memory size. I had to trade off performance for
> reliability.
>
> Unfortunately, spark does a poor job reporting off heap memory usage. From
> the profiler, it seems that the job's heap usage is pretty static but the
> off heap memory fluctuates quiet a lot. It looks like bulk of off heap is
> used by io.netty.buffer.UnpooledUnsafeDirectByteBuf while the shuffle
> client is trying to read block from shuffle service. It looks
> like org.apache.spark.network.util.TransportFrameDecoder retains them
> in buffers field while decoding responses from the shuffle service. So far,
> it's not clear why it needs to hold multiple GBs in the buffers. Perhaps
> increasing the number of partitions may help with this.
>
> Thanks,
> Aniket
>
> On Fri, Nov 25, 2016 at 1:09 AM Shreya Agarwal <sh...@microsoft.com>
> wrote:
>
> I don’t think it’s just memory overhead. It might be better to use an
> execute with lesser heap space(30GB?). 46 GB would mean more data load into
> memory and more GC, which can cause issues.
>
>
>
> Also, have you tried to persist data in any way? If so, then that might be
> causing an issue.
>
>
>
> Lastly, I am not sure if your data has a skew and if that is forcing a lot
> of data to be on one executor node.
>
>
>
> Sent from my Windows 10 phone
>
>
>
> *From: *Rodrick Brown <ro...@orchardplatform.com>
> *Sent: *Friday, November 25, 2016 12:25 AM
> *To: *Aniket Bhatnagar <an...@gmail.com>
> *Cc: *user <us...@spark.apache.org>
> *Subject: *Re: OS killing Executor due to high (possibly off heap) memory
> usage
>
>
> Try setting spark.yarn.executor.memoryOverhead 10000
>
> On Thu, Nov 24, 2016 at 11:16 AM, Aniket Bhatnagar <
> aniket.bhatnagar@gmail.com> wrote:
>
> Hi Spark users
>
> I am running a job that does join of a huge dataset (7 TB+) and the
> executors keep crashing randomly, eventually causing the job to crash.
> There are no out of memory exceptions in the log and looking at the dmesg
> output, it seems like the OS killed the JVM because of high memory usage.
> My suspicion is towards off heap usage of executor is causing this as I am
> limiting the on heap usage of executor to be 46 GB and each host running
> the executor has 60 GB of RAM. After the executor crashes, I can see that
> the external shuffle manager (org.apache.spark.network.server.TransportRequestHandler)
> logs a lot of channel closed exceptions in yarn node manager logs. This
> leads me to believe that something triggers out of memory during shuffle
> read. Is there a configuration to completely disable usage of off heap
> memory? I have tried setting spark.shuffle.io.preferDirectBufs=false but
> the executor is still getting killed by the same error.
>
> Cluster details:
> 10 AWS c4.8xlarge hosts
> RAM on each host - 60 GB
> Number of cores on each host - 36
> Additional hard disk on each host - 8 TB
>
> Spark configuration:
> dynamic allocation enabled
> external shuffle service enabled
> spark.driver.memory 1024M
> spark.executor.memory 47127M
> Spark master yarn-cluster
>
> Sample error in yarn node manager:
> 2016-11-24 10:34:06,507 ERROR org.apache.spark.network.server.TransportRequestHandler
> (shuffle-server-50): Error sending result ChunkFetchSuccess{
> streamChunkId=StreamChunkId{streamId=919299554123, chunkIndex=0}, buffer=
> FileSegmentManagedBuffer{file=/mnt3/yarn/usercache/hadoop/
> appcache/application_1479898345621_0006/blockmgr-ad5301a9-e1e9-4723-a8c4-
> 9276971b2259/2c/shuffle_3_963_0.data, offset=0, length=669014456}} to /
> 10.192.108.170:52782; closing connection
> java.nio.channels.ClosedChannelException
>
> Error in dmesg:
> [799873.309897] Out of memory: Kill process 50001 (java) score 927 or
> sacrifice child
> [799873.314439] Killed process 50001 (java) total-vm:65652448kB,
> anon-rss:57246528kB, file-rss:0kB
>
> Thanks,
> Aniket
>
>
>
>
> --
>
> [image: Orchard Platform] <http://www.orchardplatform.com/>
>
> *Rodrick Brown */ *DevOPs*
>
> 9174456839 / rodrick@orchardplatform.com
>
> Orchard Platform
> 101 5th Avenue, 4th Floor, New York, NY
>
> *NOTICE TO RECIPIENTS*: This communication is confidential and intended
> for the use of the addressee only. If you are not an intended recipient of
> this communication, please delete it immediately and notify the sender by
> return email. Unauthorized reading, dissemination, distribution or copying
> of this communication is prohibited. This communication does not constitute
> an offer to sell or a solicitation of an indication of interest to purchase
> any loan, security or any other financial product or instrument, nor is it
> an offer to sell or a solicitation of an indication of interest to purchase
> any products or services to any persons who are prohibited from receiving
> such information under applicable law. The contents of this communication
> may not be accurate or complete and are subject to change without notice.
> As such, Orchard App, Inc. (including its subsidiaries and affiliates,
> "Orchard") makes no representation regarding the accuracy or completeness
> of the information contained herein. The intended recipient is advised to
> consult its own professional advisors, including those specializing in
> legal, tax and accounting matters. Orchard does not provide legal, tax or
> accounting advice.
>
>

Re: OS killing Executor due to high (possibly off heap) memory usage

Posted by Aniket Bhatnagar <an...@gmail.com>.

Thanks Rohit, Roddick and Shreya. I tried
changing spark.yarn.executor.memoryOverhead to be 10GB and lowering
executor memory to 30 GB and both of these didn't work. I finally had to
reduce the number of cores per executor to be 18 (from 36) in addition to
setting higher spark.yarn.executor.memoryOverhead and lower executor memory
size. I had to trade off performance for reliability.

Unfortunately, spark does a poor job reporting off heap memory usage. From
the profiler, it seems that the job's heap usage is pretty static but the
off heap memory fluctuates quiet a lot. It looks like bulk of off heap is
used by io.netty.buffer.UnpooledUnsafeDirectByteBuf while the shuffle
client is trying to read block from shuffle service. It looks
like org.apache.spark.network.util.TransportFrameDecoder retains them
in buffers field while decoding responses from the shuffle service. So far,
it's not clear why it needs to hold multiple GBs in the buffers. Perhaps
increasing the number of partitions may help with this.

Thanks,
Aniket

On Fri, Nov 25, 2016 at 1:09 AM Shreya Agarwal <sh...@microsoft.com>
wrote:

I don’t think it’s just memory overhead. It might be better to use an
execute with lesser heap space(30GB?). 46 GB would mean more data load into
memory and more GC, which can cause issues.

Also, have you tried to persist data in any way? If so, then that might be
causing an issue.

Lastly, I am not sure if your data has a skew and if that is forcing a lot
of data to be on one executor node.

Sent from my Windows 10 phone

*From: *Rodrick Brown <ro...@orchardplatform.com>
*Sent: *Friday, November 25, 2016 12:25 AM
*To: *Aniket Bhatnagar <an...@gmail.com>
*Cc: *user <us...@spark.apache.org>
*Subject: *Re: OS killing Executor due to high (possibly off heap) memory
usage

Try setting spark.yarn.executor.memoryOverhead 10000

On Thu, Nov 24, 2016 at 11:16 AM, Aniket Bhatnagar <
aniket.bhatnagar@gmail.com> wrote:

Hi Spark users

I am running a job that does join of a huge dataset (7 TB+) and the
executors keep crashing randomly, eventually causing the job to crash.
There are no out of memory exceptions in the log and looking at the dmesg
output, it seems like the OS killed the JVM because of high memory usage.
My suspicion is towards off heap usage of executor is causing this as I am
limiting the on heap usage of executor to be 46 GB and each host running
the executor has 60 GB of RAM. After the executor crashes, I can see that
the external shuffle manager
(org.apache.spark.network.server.TransportRequestHandler) logs a lot of
channel closed exceptions in yarn node manager logs. This leads me to
believe that something triggers out of memory during shuffle read. Is there
a configuration to completely disable usage of off heap memory? I have
tried setting spark.shuffle.io.preferDirectBufs=false but the executor is
still getting killed by the same error.

Cluster details:
10 AWS c4.8xlarge hosts
RAM on each host - 60 GB
Number of cores on each host - 36
Additional hard disk on each host - 8 TB

Spark configuration:
dynamic allocation enabled
external shuffle service enabled
spark.driver.memory 1024M
spark.executor.memory 47127M
Spark master yarn-cluster

Sample error in yarn node manager:
2016-11-24 10:34:06,507 ERROR
org.apache.spark.network.server.TransportRequestHandler
(shuffle-server-50): Error sending result
ChunkFetchSuccess{streamChunkId=StreamChunkId{streamId=919299554123,
chunkIndex=0},
buffer=FileSegmentManagedBuffer{file=/mnt3/yarn/usercache/hadoop/appcache/application_1479898345621_0006/blockmgr-ad5301a9-e1e9-4723-a8c4-9276971b2259/2c/shuffle_3_963_0.data,
offset=0, length=669014456}} to /10.192.108.170:52782; closing connection
java.nio.channels.ClosedChannelException

Error in dmesg:
[799873.309897] Out of memory: Kill process 50001 (java) score 927 or
sacrifice child
[799873.314439] Killed process 50001 (java) total-vm:65652448kB,
anon-rss:57246528kB, file-rss:0kB

Thanks,
Aniket

[image: Orchard Platform] <http://www.orchardplatform.com/>

*Rodrick Brown */ *DevOPs*

9174456839 / rodrick@orchardplatform.com

Orchard Platform
101 5th Avenue, 4th Floor, New York, NY

*NOTICE TO RECIPIENTS*: This communication is confidential and intended for
the use of the addressee only. If you are not an intended recipient of this
communication, please delete it immediately and notify the sender by return
email. Unauthorized reading, dissemination, distribution or copying of this
communication is prohibited. This communication does not constitute an
offer to sell or a solicitation of an indication of interest to purchase
any loan, security or any other financial product or instrument, nor is it
an offer to sell or a solicitation of an indication of interest to purchase
any products or services to any persons who are prohibited from receiving
such information under applicable law. The contents of this communication
may not be accurate or complete and are subject to change without notice.
As such, Orchard App, Inc. (including its subsidiaries and affiliates,
"Orchard") makes no representation regarding the accuracy or completeness
of the information contained herein. The intended recipient is advised to
consult its own professional advisors, including those specializing in
legal, tax and accounting matters. Orchard does not provide legal, tax or
accounting advice.

RE: OS killing Executor due to high (possibly off heap) memory usage

Posted by Shreya Agarwal <sh...@microsoft.com>.

I don’t think it’s just memory overhead. It might be better to use an execute with lesser heap space(30GB?). 46 GB would mean more data load into memory and more GC, which can cause issues.

Also, have you tried to persist data in any way? If so, then that might be causing an issue.

Lastly, I am not sure if your data has a skew and if that is forcing a lot of data to be on one executor node.

Sent from my Windows 10 phone

From: Rodrick Brown<ma...@orchardplatform.com>
Sent: Friday, November 25, 2016 12:25 AM
To: Aniket Bhatnagar<ma...@gmail.com>
Cc: user<ma...@spark.apache.org>
Subject: Re: OS killing Executor due to high (possibly off heap) memory usage

Try setting spark.yarn.executor.memoryOverhead 10000

On Thu, Nov 24, 2016 at 11:16 AM, Aniket Bhatnagar <an...@gmail.com>> wrote:
Hi Spark users

I am running a job that does join of a huge dataset (7 TB+) and the executors keep crashing randomly, eventually causing the job to crash. There are no out of memory exceptions in the log and looking at the dmesg output, it seems like the OS killed the JVM because of high memory usage. My suspicion is towards off heap usage of executor is causing this as I am limiting the on heap usage of executor to be 46 GB and each host running the executor has 60 GB of RAM. After the executor crashes, I can see that the external shuffle manager (org.apache.spark.network.server.TransportRequestHandler) logs a lot of channel closed exceptions in yarn node manager logs. This leads me to believe that something triggers out of memory during shuffle read. Is there a configuration to completely disable usage of off heap memory? I have tried setting spark.shuffle.io<http://spark.shuffle.io>.preferDirectBufs=false but the executor is still getting killed by the same error.

Cluster details:
10 AWS c4.8xlarge hosts
RAM on each host - 60 GB
Number of cores on each host - 36
Additional hard disk on each host - 8 TB

Spark configuration:
dynamic allocation enabled
external shuffle service enabled
spark.driver.memory 1024M
spark.executor.memory 47127M
Spark master yarn-cluster

Sample error in yarn node manager:
2016-11-24 10:34:06,507 ERROR org.apache.spark.network.server.TransportRequestHandler (shuffle-server-50): Error sending result ChunkFetchSuccess{streamChunkId=StreamChunkId{streamId=919299554123, chunkIndex=0}, buffer=FileSegmentManagedBuffer{file=/mnt3/yarn/usercache/hadoop/appcache/application_1479898345621_0006/blockmgr-ad5301a9-e1e9-4723-a8c4-9276971b2259/2c/shuffle_3_963_0.data, offset=0, length=669014456}} to /10.192.108.170:52782<http://10.192.108.170:52782>; closing connection
java.nio.channels.ClosedChannelException

Error in dmesg:
[799873.309897] Out of memory: Kill process 50001 (java) score 927 or sacrifice child
[799873.314439] Killed process 50001 (java) total-vm:65652448kB, anon-rss:57246528kB, file-rss:0kB

Thanks,
Aniket



--

[Orchard Platform]<http://www.orchardplatform.com/>

Rodrick Brown / DevOPs

9174456839 / rodrick@orchardplatform.com<ma...@orchardplatform.com>

Orchard Platform
101 5th Avenue, 4th Floor, New York, NY

NOTICE TO RECIPIENTS: This communication is confidential and intended for the use of the addressee only. If you are not an intended recipient of this communication, please delete it immediately and notify the sender by return email. Unauthorized reading, dissemination, distribution or copying of this communication is prohibited. This communication does not constitute an offer to sell or a solicitation of an indication of interest to purchase any loan, security or any other financial product or instrument, nor is it an offer to sell or a solicitation of an indication of interest to purchase any products or services to any persons who are prohibited from receiving such information under applicable law. The contents of this communication may not be accurate or complete and are subject to change without notice. As such, Orchard App, Inc. (including its subsidiaries and affiliates, "Orchard") makes no representation regarding the accuracy or completeness of the information contained herein. The intended recipient is advised to consult its own professional advisors, including those specializing in legal, tax and accounting matters. Orchard does not provide legal, tax or accounting advice.

Re: OS killing Executor due to high (possibly off heap) memory usage

Posted by Rodrick Brown <ro...@orchardplatform.com>.

Try setting spark.yarn.executor.memoryOverhead 10000

On Thu, Nov 24, 2016 at 11:16 AM, Aniket Bhatnagar <
aniket.bhatnagar@gmail.com> wrote:

> Hi Spark users
>
> I am running a job that does join of a huge dataset (7 TB+) and the
> executors keep crashing randomly, eventually causing the job to crash.
> There are no out of memory exceptions in the log and looking at the dmesg
> output, it seems like the OS killed the JVM because of high memory usage.
> My suspicion is towards off heap usage of executor is causing this as I am
> limiting the on heap usage of executor to be 46 GB and each host running
> the executor has 60 GB of RAM. After the executor crashes, I can see that
> the external shuffle manager (org.apache.spark.network.server.TransportRequestHandler)
> logs a lot of channel closed exceptions in yarn node manager logs. This
> leads me to believe that something triggers out of memory during shuffle
> read. Is there a configuration to completely disable usage of off heap
> memory? I have tried setting spark.shuffle.io.preferDirectBufs=false but
> the executor is still getting killed by the same error.
>
> Cluster details:
> 10 AWS c4.8xlarge hosts
> RAM on each host - 60 GB
> Number of cores on each host - 36
> Additional hard disk on each host - 8 TB
>
> Spark configuration:
> dynamic allocation enabled
> external shuffle service enabled
> spark.driver.memory 1024M
> spark.executor.memory 47127M
> Spark master yarn-cluster
>
> Sample error in yarn node manager:
> 2016-11-24 10:34:06,507 ERROR org.apache.spark.network.server.TransportRequestHandler
> (shuffle-server-50): Error sending result ChunkFetchSuccess{
> streamChunkId=StreamChunkId{streamId=919299554123, chunkIndex=0}, buffer=
> FileSegmentManagedBuffer{file=/mnt3/yarn/usercache/hadoop/
> appcache/application_1479898345621_0006/blockmgr-ad5301a9-e1e9-4723-a8c4-
> 9276971b2259/2c/shuffle_3_963_0.data, offset=0, length=669014456}} to /
> 10.192.108.170:52782; closing connection
> java.nio.channels.ClosedChannelException
>
> Error in dmesg:
> [799873.309897] Out of memory: Kill process 50001 (java) score 927 or
> sacrifice child
> [799873.314439] Killed process 50001 (java) total-vm:65652448kB,
> anon-rss:57246528kB, file-rss:0kB
>
> Thanks,
> Aniket
>



-- 

[image: Orchard Platform] <http://www.orchardplatform.com/>

*Rodrick Brown */ *DevOPs*

9174456839 / rodrick@orchardplatform.com

Orchard Platform
101 5th Avenue, 4th Floor, New York, NY

-- 
*NOTICE TO RECIPIENTS*: This communication is confidential and intended for 
the use of the addressee only. If you are not an intended recipient of this 
communication, please delete it immediately and notify the sender by return 
email. Unauthorized reading, dissemination, distribution or copying of this 
communication is prohibited. This communication does not constitute an 
offer to sell or a solicitation of an indication of interest to purchase 
any loan, security or any other financial product or instrument, nor is it 
an offer to sell or a solicitation of an indication of interest to purchase 
any products or services to any persons who are prohibited from receiving 
such information under applicable law. The contents of this communication 
may not be accurate or complete and are subject to change without notice. 
As such, Orchard App, Inc. (including its subsidiaries and affiliates, 
"Orchard") makes no representation regarding the accuracy or completeness 
of the information contained herein. The intended recipient is advised to 
consult its own professional advisors, including those specializing in 
legal, tax and accounting matters. Orchard does not provide legal, tax or 
accounting advice.