You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Kaushal Prajapati (JIRA)" <ji...@apache.org> on 2017/11/06 14:12:00 UTC
[jira] [Updated] (SPARK-22458) OutOfDirectMemoryError with Spark
2.2
[ https://issues.apache.org/jira/browse/SPARK-22458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kaushal Prajapati updated SPARK-22458:
--------------------------------------
Description:
We were using Spark 2.1 from last 6 months to execute multiple spark jobs that is running 15 hour long for 50+ TB of source data with below configurations successfully.
spark.master yarn
spark.driver.cores 10
spark.driver.maxResultSize 5g
spark.driver.memory 20g
spark.executor.cores 5
spark.executor.extraJavaOptions -XX:+UseG1GC *-Dio.netty.maxDirectMemory=1024* -XX:MaxGCPauseMillis=60000 *-XX:MaxDirectMemorySize=2048m* -Dlog4j.configuration=file:///conf/log4j.properties -Dhdp.version=2.5.3.0-37
spark.driver.extraJavaOptions * -Dio.netty.maxDirectMemory=2048 -XX:MaxDirectMemorySize=2048m* -Dlog4j.configuration=file:///conf/log4j.properties -Dhdp.version=2.5.3.0-37
spark.executor.instances 30
spark.executor.memory 30g
*spark.kryoserializer.buffer.max 512m*
spark.network.timeout 12000s
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.shuffle.io.preferDirectBufs false
spark.sql.catalogImplementation hive
spark.sql.shuffle.partitions 5000
spark.yarn.driver.memoryOverhead 1536
spark.yarn.executor.memoryOverhead 4096
spark.core.connection.ack.wait.timeout 600s
spark.scheduler.maxRegisteredResourcesWaitingTime 15s
spark.sql.hive.filesourcePartitionFileCacheSize 524288000
spark.dynamicAllocation.executorIdleTimeout 30000s
spark.dynamicAllocation.enabled true
spark.hadoop.yarn.timeline-service.enabled false
spark.shuffle.service.enabled true
spark.yarn.am.extraJavaOptions -Dhdp.version=2.5.3.0-37 * -Dio.netty.maxDirectMemory=1024 -XX:MaxDirectMemorySize=1024m*
Recently we tried to upgrade from Spark 2.1 to Spark 2.2 to get some fixes using latest version. But we started facing DirectBuffer outOfMemory error and exceeding memory limits for executor memoryOverhead issue. To fix that we started tweaking multiple properties but still issue persists. Relevant information is shared below
Please let me any other details is requried,
Snapshot for DirectMemory Error Stacktrace :-
10:48:26.417 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 5.0 in stage 5.3 (TID 25022, dedwdprshc070.de.xxxxxxx.com, executor 615): FetchFailed(BlockManagerId(465, dedwdprshc061.de.xxxxxxx.com, 7337, None), shuffleId=7, mapId=141, reduceId=3372, message=
org.apache.spark.shuffle.FetchFailedException: failed to allocate 65536 byte(s) of direct memory (used: 1073699840, max: 1073741824)
at org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:442)
at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:418)
at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:59)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.sort_addToSorter$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.findNextInnerJoinRows$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$2.hasNext(WholeStageCodegenExec.scala:414)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:166)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Caused by: io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 65536 byte(s) of direct memory (used: 1073699840, max: 1073741824)
at io.netty.util.internal.PlatformDependent.incrementMemoryCounter(PlatformDependent.java:530)
at io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner(PlatformDependent.java:484)
at io.netty.buffer.UnpooledUnsafeNoCleanerDirectByteBuf.allocateDirect(UnpooledUnsafeNoCleanerDirectByteBuf.java:30)
at io.netty.buffer.UnpooledUnsafeDirectByteBuf.<init>(UnpooledUnsafeDirectByteBuf.java:67)
at io.netty.buffer.UnpooledUnsafeNoCleanerDirectByteBuf.<init>(UnpooledUnsafeNoCleanerDirectByteBuf.java:25)
at io.netty.buffer.UnsafeByteBufUtil.newUnsafeDirectByteBuf(UnsafeByteBufUtil.java:425)
at io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:299)
at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:177)
at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:168)
at io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:129)
at io.netty.channel.AdaptiveRecvByteBufAllocator$HandleImpl.allocate(AdaptiveRecvByteBufAllocator.java:104)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:117)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:643)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:566)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:480)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:442)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
... 1 more
if i removed above netty configuration, getting below error
Snapshot for Excedding memory overhead Stacktrace :-
{code:java}
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 3372 in stage 5.0 failed 4 times, most recent failure: Lost task 3372.3 in stage 5.0 (TID 19534, dedwfprshd006.de.xxxxxxx.com, executor 125): ExecutorLostFailure (executor 125 exited caused by one of the running tasks) Reason: Container killed by YARN for exceeding memory limits. 37.1 GB of 34 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead.
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1499)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1487)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1486)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1486)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:814)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1714)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1669)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1658)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:630)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2022)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply$mcV$sp(FileFormatWriter.scala:188)
... 49 more
{code}
was:
We were using Spark 2.1 from last 6 months to execute multiple spark jobs that is running 15 hour long for 50+ TB of source data with below configurations successfully.
spark.master yarn
spark.driver.cores 10
spark.driver.maxResultSize 5g
spark.driver.memory 20g
spark.executor.cores 5
spark.executor.extraJavaOptions -XX:+UseG1GC *-Dio.netty.maxDirectMemory=1024* -XX:MaxGCPauseMillis=60000 *-XX:MaxDirectMemorySize=2048m* -Dlog4j.configuration=file:///conf/log4j.properties -Dhdp.version=2.5.3.0-37
spark.driver.extraJavaOptions * -Dio.netty.maxDirectMemory=2048 -XX:MaxDirectMemorySize=2048m* -Dlog4j.configuration=file:///conf/log4j.properties -Dhdp.version=2.5.3.0-37
spark.executor.instances 30
spark.executor.memory 30g
*spark.kryoserializer.buffer.max 512m*
spark.network.timeout 12000s
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.shuffle.io.preferDirectBufs false
spark.sql.catalogImplementation hive
spark.sql.shuffle.partitions 5000
spark.yarn.driver.memoryOverhead 1536
spark.yarn.executor.memoryOverhead 4096
spark.core.connection.ack.wait.timeout 600s
spark.scheduler.maxRegisteredResourcesWaitingTime 15s
spark.sql.hive.filesourcePartitionFileCacheSize 524288000
spark.dynamicAllocation.executorIdleTimeout 30000s
spark.dynamicAllocation.enabled true
spark.hadoop.yarn.timeline-service.enabled false
spark.shuffle.service.enabled true
spark.yarn.am.extraJavaOptions -Dhdp.version=2.5.3.0-37 * -Dio.netty.maxDirectMemory=1024 -XX:MaxDirectMemorySize=1024m*
Recently we tried to upgrade from Spark 2.1 to Spark 2.2 to get some fixes using latest version. But we started facing DirectBuffer outOfMemory error and exceeding memory limits for executor memoryOverhead issue. To fix that we started tweaking multiple properties but still issue persists. Relevant information is shared below
Please let me any other details is requried,
Snapshot for DirectMemory Error Stacktrace :-
10:48:26.417 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 5.0 in stage 5.3 (TID 25022, dedwdprshc070.de.xxxxxxx.com, executor 615): FetchFailed(BlockManagerId(465, dedwdprshc061.de.xxxxxxx.com, 7337, None), shuffleId=7, mapId=141, reduceId=3372, message=
org.apache.spark.shuffle.FetchFailedException: failed to allocate 65536 byte(s) of direct memory (used: 1073699840, max: 1073741824)
at org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:442)
at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:418)
at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:59)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.sort_addToSorter$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.findNextInnerJoinRows$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$2.hasNext(WholeStageCodegenExec.scala:414)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:166)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Caused by: io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 65536 byte(s) of direct memory (used: 1073699840, max: 1073741824)
at io.netty.util.internal.PlatformDependent.incrementMemoryCounter(PlatformDependent.java:530)
at io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner(PlatformDependent.java:484)
at io.netty.buffer.UnpooledUnsafeNoCleanerDirectByteBuf.allocateDirect(UnpooledUnsafeNoCleanerDirectByteBuf.java:30)
at io.netty.buffer.UnpooledUnsafeDirectByteBuf.<init>(UnpooledUnsafeDirectByteBuf.java:67)
at io.netty.buffer.UnpooledUnsafeNoCleanerDirectByteBuf.<init>(UnpooledUnsafeNoCleanerDirectByteBuf.java:25)
at io.netty.buffer.UnsafeByteBufUtil.newUnsafeDirectByteBuf(UnsafeByteBufUtil.java:425)
at io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:299)
at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:177)
at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:168)
at io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:129)
at io.netty.channel.AdaptiveRecvByteBufAllocator$HandleImpl.allocate(AdaptiveRecvByteBufAllocator.java:104)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:117)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:643)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:566)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:480)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:442)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
... 1 more
if i removed above netty configuration, getting below error
Snapshot for Excedding memory overhead Stacktrace :-
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 3372 in stage 5.0 failed 4 times, most recent failure: Lost task 3372.3 in stage 5.0 (TID 19534, dedwfprshd006.de.xxxxxxx.com, executor 125): ExecutorLostFailure (executor 125 exited caused by one of the running tasks) Reason: Container killed by YARN for exceeding memory limits. 37.1 GB of 34 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead.
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1499)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1487)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1486)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1486)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:814)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1714)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1669)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1658)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:630)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2022)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply$mcV$sp(FileFormatWriter.scala:188)
... 49 more
> OutOfDirectMemoryError with Spark 2.2
> -------------------------------------
>
> Key: SPARK-22458
> URL: https://issues.apache.org/jira/browse/SPARK-22458
> Project: Spark
> Issue Type: Bug
> Components: Shuffle, SQL, YARN
> Affects Versions: 2.2.0
> Reporter: Kaushal Prajapati
> Priority: Blocker
>
> We were using Spark 2.1 from last 6 months to execute multiple spark jobs that is running 15 hour long for 50+ TB of source data with below configurations successfully.
> spark.master yarn
> spark.driver.cores 10
> spark.driver.maxResultSize 5g
> spark.driver.memory 20g
> spark.executor.cores 5
> spark.executor.extraJavaOptions -XX:+UseG1GC *-Dio.netty.maxDirectMemory=1024* -XX:MaxGCPauseMillis=60000 *-XX:MaxDirectMemorySize=2048m* -Dlog4j.configuration=file:///conf/log4j.properties -Dhdp.version=2.5.3.0-37
> spark.driver.extraJavaOptions * -Dio.netty.maxDirectMemory=2048 -XX:MaxDirectMemorySize=2048m* -Dlog4j.configuration=file:///conf/log4j.properties -Dhdp.version=2.5.3.0-37
> spark.executor.instances 30
> spark.executor.memory 30g
> *spark.kryoserializer.buffer.max 512m*
> spark.network.timeout 12000s
> spark.serializer org.apache.spark.serializer.KryoSerializer
> spark.shuffle.io.preferDirectBufs false
> spark.sql.catalogImplementation hive
> spark.sql.shuffle.partitions 5000
> spark.yarn.driver.memoryOverhead 1536
> spark.yarn.executor.memoryOverhead 4096
> spark.core.connection.ack.wait.timeout 600s
> spark.scheduler.maxRegisteredResourcesWaitingTime 15s
> spark.sql.hive.filesourcePartitionFileCacheSize 524288000
> spark.dynamicAllocation.executorIdleTimeout 30000s
> spark.dynamicAllocation.enabled true
> spark.hadoop.yarn.timeline-service.enabled false
> spark.shuffle.service.enabled true
> spark.yarn.am.extraJavaOptions -Dhdp.version=2.5.3.0-37 * -Dio.netty.maxDirectMemory=1024 -XX:MaxDirectMemorySize=1024m*
> Recently we tried to upgrade from Spark 2.1 to Spark 2.2 to get some fixes using latest version. But we started facing DirectBuffer outOfMemory error and exceeding memory limits for executor memoryOverhead issue. To fix that we started tweaking multiple properties but still issue persists. Relevant information is shared below
> Please let me any other details is requried,
>
> Snapshot for DirectMemory Error Stacktrace :-
> 10:48:26.417 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 5.0 in stage 5.3 (TID 25022, dedwdprshc070.de.xxxxxxx.com, executor 615): FetchFailed(BlockManagerId(465, dedwdprshc061.de.xxxxxxx.com, 7337, None), shuffleId=7, mapId=141, reduceId=3372, message=
> org.apache.spark.shuffle.FetchFailedException: failed to allocate 65536 byte(s) of direct memory (used: 1073699840, max: 1073741824)
> at org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:442)
> at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:418)
> at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:59)
> at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
> at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
> at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.sort_addToSorter$(Unknown Source)
> at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
> at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
> at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395)
> at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.findNextInnerJoinRows$(Unknown Source)
> at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
> at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
> at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$2.hasNext(WholeStageCodegenExec.scala:414)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:166)
> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
> at org.apache.spark.scheduler.Task.run(Task.scala:108)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 65536 byte(s) of direct memory (used: 1073699840, max: 1073741824)
> at io.netty.util.internal.PlatformDependent.incrementMemoryCounter(PlatformDependent.java:530)
> at io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner(PlatformDependent.java:484)
> at io.netty.buffer.UnpooledUnsafeNoCleanerDirectByteBuf.allocateDirect(UnpooledUnsafeNoCleanerDirectByteBuf.java:30)
> at io.netty.buffer.UnpooledUnsafeDirectByteBuf.<init>(UnpooledUnsafeDirectByteBuf.java:67)
> at io.netty.buffer.UnpooledUnsafeNoCleanerDirectByteBuf.<init>(UnpooledUnsafeNoCleanerDirectByteBuf.java:25)
> at io.netty.buffer.UnsafeByteBufUtil.newUnsafeDirectByteBuf(UnsafeByteBufUtil.java:425)
> at io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:299)
> at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:177)
> at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:168)
> at io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:129)
> at io.netty.channel.AdaptiveRecvByteBufAllocator$HandleImpl.allocate(AdaptiveRecvByteBufAllocator.java:104)
> at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:117)
> at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:643)
> at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:566)
> at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:480)
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:442)
> at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)
> at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
> ... 1 more
> if i removed above netty configuration, getting below error
> Snapshot for Excedding memory overhead Stacktrace :-
>
> {code:java}
> Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 3372 in stage 5.0 failed 4 times, most recent failure: Lost task 3372.3 in stage 5.0 (TID 19534, dedwfprshd006.de.xxxxxxx.com, executor 125): ExecutorLostFailure (executor 125 exited caused by one of the running tasks) Reason: Container killed by YARN for exceeding memory limits. 37.1 GB of 34 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead.
> Driver stacktrace:
> at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1499)
> at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1487)
> at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1486)
> at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
> at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1486)
> at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
> at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
> at scala.Option.foreach(Option.scala:257)
> at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:814)
> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1714)
> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1669)
> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1658)
> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
> at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:630)
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:2022)
> at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply$mcV$sp(FileFormatWriter.scala:188)
> ... 49 more
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org