You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Sushrut Ikhar <su...@gmail.com> on 2016/10/28 14:27:08 UTC

MemoryStore reporting wrong free memory in spark 1.6.2

Hi,
I am seeing wrong computation of storage memory available which is leading
to executor failures. I have allocated 8g memory with params:
spark.memory.fraction=0.7
spark.memory.storageFraction=0.4
As expected, I was able to see 5.2 GB storage memory in UI.
However, as per memory store logs I am seeing that free memory is
increasing as the rdds are getting cached; which should have been ideally
decreasing from 5.2 to 0.

Eventually, the executor runs OOM whereas the Memory store is reporting it
has 5G available.


Attached executor logs -

2016-10-28 12:36:51,996 INFO  [Executor task launch worker-10]
executor.Executor (Logging.scala:logInfo(58)) - Running task 324.0 in stage
1.0 (TID 760)
2016-10-28 12:36:52,019 INFO  [Executor task launch worker-10]
spark.CacheManager (Logging.scala:logInfo(58)) - Partition rdd_10_324 not
found, computing it
2016-10-28 12:36:52,031 INFO  [Executor task launch worker-8]
storage.MemoryStore (Logging.scala:logInfo(58)) - *Block rdd_10_145 stored
as values in memory (estimated size 65.0 MB, free 260.7 MB)*
2016-10-28 12:36:52,062 INFO  [Executor task launch worker-10]
storage.ShuffleBlockFetcherIterator (Logging.scala:logInfo(58)) - Getting
426 non-empty blocks out of 426 blocks
2016-10-28 12:36:52,109 INFO  [Executor task launch worker-10]
storage.ShuffleBlockFetcherIterator (Logging.scala:logInfo(58)) - Started
37 remote fetches in 47 ms
2016-10-28 12:36:52,130 INFO  [Executor task launch worker-8]
executor.Executor (Logging.scala:logInfo(58)) - Finished task 145.0 in
stage 1.0 (TID 581). 268313 bytes result sent to driver
2016-10-28 12:36:52,150 INFO  [dispatcher-event-loop-7]
executor.CoarseGrainedExecutorBackend (Logging.scala:logInfo(58)) - Got
assigned task 761
2016-10-28 12:36:52,150 INFO  [Executor task launch worker-8]
executor.Executor (Logging.scala:logInfo(58)) - Running task 325.0 in stage
1.0 (TID 761)
2016-10-28 12:36:52,164 INFO  [Executor task launch worker-8]
spark.CacheManager (Logging.scala:logInfo(58)) - Partition rdd_10_325 not
found, computing it
2016-10-28 12:36:52,247 INFO  [Executor task launch worker-8]
storage.ShuffleBlockFetcherIterator (Logging.scala:logInfo(58)) - Getting
426 non-empty blocks out of 426 blocks
2016-10-28 12:36:52,264 INFO  [Executor task launch worker-8]
storage.ShuffleBlockFetcherIterator (Logging.scala:logInfo(58)) - Started
37 remote fetches in 18 ms
2016-10-28 12:36:52,591 INFO  [Executor task launch worker-6]
storage.MemoryStore (Logging.scala:logInfo(58)) -* Block rdd_10_45 stored
as values in memory (estimated size 65.0 MB, free 325.7 MB)*
2016-10-28 12:36:52,646 INFO  [Executor task launch worker-6]
executor.Executor (Logging.scala:logInfo(58)) - Finished task 45.0 in stage
1.0 (TID 481). 266368 bytes result sent to driver

Eventual Failures logs-

2016-10-28 12:53:06,718 WARN  [Executor task launch worker-13]
storage.MemoryStore (Logging.scala:logWarning(70)) - Not enough space to
cache rdd_10_656 in memory! (computed 45.2 MB so far)
2016-10-28 12:53:06,718 INFO  [Executor task launch worker-13]
storage.MemoryStore (Logging.scala:logInfo(58)) - Memory use = 5.0 GB
(blocks) + 211.4 MB (scratch space shared across 103 tasks(s)) = 5.2 GB.
Storage limit = 5.2 GB.
2016-10-28 12:53:06,718 INFO  [Executor task launch worker-13]
storage.BlockManager (Logging.scala:logInfo(58)) - Found block rdd_10_656
locally
2016-10-28 12:53:06,719 INFO  [Executor task launch worker-12]
storage.MemoryStore (Logging.scala:logInfo(58)) - 1 blocks selected for
dropping
2016-10-28 12:53:06,720 INFO  [Executor task launch worker-12]
storage.BlockManager (Logging.scala:logInfo(58)) - Dropping block
rdd_10_719 from memory
2016-10-28 12:53:06,720 INFO  [Executor task launch worker-12]
storage.BlockManager (Logging.scala:logInfo(58)) - Writing block rdd_10_719
to disk
2016-10-28 12:53:06,736 ERROR [Executor task launch worker-15]
executor.Executor (Logging.scala:logError(95)) - Exception in task 657.0 in
stage 4.0 (TID 4565)
java.lang.OutOfMemoryError: Unable to acquire 262144 bytes of memory, got
85027
at
org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:91)


Is this a bug or I am setting something wrong?

Regards,

Sushrut Ikhar
[image: https://]about.me/sushrutikhar
<https://about.me/sushrutikhar?promo=email_sig>

Re: MemoryStore reporting wrong free memory in spark 1.6.2

Posted by Sushrut Ikhar <su...@gmail.com>.
found the reporting bug in 1.6.2-
Utils.bytesToString(maxMemory - blocksMemoryUsed)) should've been used

https://github.com/apache/spark/blob/v2.0.1/core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala#L154

https://github.com/apache/spark/blob/v1.6.2/core/src/main/scala/org/apache/spark/storage/MemoryStore.scala#L396

Regards,

Sushrut Ikhar
[image: https://]about.me/sushrutikhar
<https://about.me/sushrutikhar?promo=email_sig>


On Fri, Oct 28, 2016 at 7:57 PM, Sushrut Ikhar <su...@gmail.com>
wrote:

> Hi,
> I am seeing wrong computation of storage memory available which is leading
> to executor failures. I have allocated 8g memory with params:
> spark.memory.fraction=0.7
> spark.memory.storageFraction=0.4
> As expected, I was able to see 5.2 GB storage memory in UI.
> However, as per memory store logs I am seeing that free memory is
> increasing as the rdds are getting cached; which should have been ideally
> decreasing from 5.2 to 0.
>
> Eventually, the executor runs OOM whereas the Memory store is reporting it
> has 5G available.
>
>
> Attached executor logs -
>
> 2016-10-28 12:36:51,996 INFO  [Executor task launch worker-10]
> executor.Executor (Logging.scala:logInfo(58)) - Running task 324.0 in stage
> 1.0 (TID 760)
> 2016-10-28 12:36:52,019 INFO  [Executor task launch worker-10]
> spark.CacheManager (Logging.scala:logInfo(58)) - Partition rdd_10_324 not
> found, computing it
> 2016-10-28 12:36:52,031 INFO  [Executor task launch worker-8]
> storage.MemoryStore (Logging.scala:logInfo(58)) - *Block rdd_10_145
> stored as values in memory (estimated size 65.0 MB, free 260.7 MB)*
> 2016-10-28 12:36:52,062 INFO  [Executor task launch worker-10] storage.ShuffleBlockFetcherIterator
> (Logging.scala:logInfo(58)) - Getting 426 non-empty blocks out of 426 blocks
> 2016-10-28 12:36:52,109 INFO  [Executor task launch worker-10] storage.ShuffleBlockFetcherIterator
> (Logging.scala:logInfo(58)) - Started 37 remote fetches in 47 ms
> 2016-10-28 12:36:52,130 INFO  [Executor task launch worker-8]
> executor.Executor (Logging.scala:logInfo(58)) - Finished task 145.0 in
> stage 1.0 (TID 581). 268313 bytes result sent to driver
> 2016-10-28 12:36:52,150 INFO  [dispatcher-event-loop-7] executor.CoarseGrainedExecutorBackend
> (Logging.scala:logInfo(58)) - Got assigned task 761
> 2016-10-28 12:36:52,150 INFO  [Executor task launch worker-8]
> executor.Executor (Logging.scala:logInfo(58)) - Running task 325.0 in stage
> 1.0 (TID 761)
> 2016-10-28 12:36:52,164 INFO  [Executor task launch worker-8]
> spark.CacheManager (Logging.scala:logInfo(58)) - Partition rdd_10_325 not
> found, computing it
> 2016-10-28 12:36:52,247 INFO  [Executor task launch worker-8] storage.ShuffleBlockFetcherIterator
> (Logging.scala:logInfo(58)) - Getting 426 non-empty blocks out of 426 blocks
> 2016-10-28 12:36:52,264 INFO  [Executor task launch worker-8] storage.ShuffleBlockFetcherIterator
> (Logging.scala:logInfo(58)) - Started 37 remote fetches in 18 ms
> 2016-10-28 12:36:52,591 INFO  [Executor task launch worker-6]
> storage.MemoryStore (Logging.scala:logInfo(58)) -* Block rdd_10_45 stored
> as values in memory (estimated size 65.0 MB, free 325.7 MB)*
> 2016-10-28 12:36:52,646 INFO  [Executor task launch worker-6]
> executor.Executor (Logging.scala:logInfo(58)) - Finished task 45.0 in stage
> 1.0 (TID 481). 266368 bytes result sent to driver
>
> Eventual Failures logs-
>
> 2016-10-28 12:53:06,718 WARN  [Executor task launch worker-13]
> storage.MemoryStore (Logging.scala:logWarning(70)) - Not enough space to
> cache rdd_10_656 in memory! (computed 45.2 MB so far)
> 2016-10-28 12:53:06,718 INFO  [Executor task launch worker-13]
> storage.MemoryStore (Logging.scala:logInfo(58)) - Memory use = 5.0 GB
> (blocks) + 211.4 MB (scratch space shared across 103 tasks(s)) = 5.2 GB.
> Storage limit = 5.2 GB.
> 2016-10-28 12:53:06,718 INFO  [Executor task launch worker-13]
> storage.BlockManager (Logging.scala:logInfo(58)) - Found block rdd_10_656
> locally
> 2016-10-28 12:53:06,719 INFO  [Executor task launch worker-12]
> storage.MemoryStore (Logging.scala:logInfo(58)) - 1 blocks selected for
> dropping
> 2016-10-28 12:53:06,720 INFO  [Executor task launch worker-12]
> storage.BlockManager (Logging.scala:logInfo(58)) - Dropping block
> rdd_10_719 from memory
> 2016-10-28 12:53:06,720 INFO  [Executor task launch worker-12]
> storage.BlockManager (Logging.scala:logInfo(58)) - Writing block rdd_10_719
> to disk
> 2016-10-28 12:53:06,736 ERROR [Executor task launch worker-15]
> executor.Executor (Logging.scala:logError(95)) - Exception in task 657.0 in
> stage 4.0 (TID 4565)
> java.lang.OutOfMemoryError: Unable to acquire 262144 bytes of memory, got
> 85027
> at org.apache.spark.memory.MemoryConsumer.allocateArray(
> MemoryConsumer.java:91)
>
>
> Is this a bug or I am setting something wrong?
>
> Regards,
>
> Sushrut Ikhar
> [image: https://]about.me/sushrutikhar
> <https://about.me/sushrutikhar?promo=email_sig>
>
>