You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Ernest (JIRA)" <ji...@apache.org> on 2016/03/22 03:26:25 UTC

[jira] [Created] (SPARK-14055) AssertionError may happeneds if not unlock writeLock when doing 'removeBlock' method

Ernest created SPARK-14055:
------------------------------

             Summary: AssertionError may happeneds if not unlock writeLock when doing 'removeBlock' method
                 Key: SPARK-14055
                 URL: https://issues.apache.org/jira/browse/SPARK-14055
             Project: Spark
          Issue Type: Bug
          Components: Block Manager, Spark Core
         Environment: Spark 2.0-SNAPSHOT
Single Rack
Standalone mode scheduling
8 node cluster
16 cores & 64G RAM / node
Data Replication factor of 2

Each Node has 1 Spark executors configured with 16 cores each and 40GB of RAM.
            Reporter: Ernest
            Priority: Minor


We got the following log when running _LiveJournalPageRank_.
{quote}
452823:16/03/21 19:28:47.444 TRACE BlockInfoManager: Task 1662 trying to acquire write lock for rdd_3_183
452825:16/03/21 19:28:47.445 TRACE BlockInfoManager: Task 1662 acquired write lock for rdd_3_183
456941:16/03/21 19:28:47.596 INFO BlockManager: Dropping block rdd_3_183 from memory
456943:16/03/21 19:28:47.597 DEBUG MemoryStore: Block rdd_3_183 of size 418784648 dropped from memory (free 3504141600)
457027:16/03/21 19:28:47.600 DEBUG BlockManagerMaster: Updated info of block rdd_3_183
457053:16/03/21 19:28:47.600 DEBUG BlockManager: Told master about block rdd_3_183
457082:16/03/21 19:28:47.602 TRACE BlockInfoManager: Task 1662 trying to remove block rdd_3_183
500373:16/03/21 19:28:49.893 TRACE BlockInfoManager: Task 1681 trying to put rdd_3_183
500374:16/03/21 19:28:49.893 TRACE BlockInfoManager: Task 1681 trying to acquire read lock for rdd_3_183
500375:16/03/21 19:28:49.893 TRACE BlockInfoManager: Task 1681 trying to acquire write lock for rdd_3_183
500376:16/03/21 19:28:49.893 TRACE BlockInfoManager: Task 1681 acquired write lock for rdd_3_183
517257:16/03/21 19:28:56.299 INFO BlockInfoManager: ****** taskAttemptId is: 1662, info.writerTask is: 1681, blockID is: rdd_3_183 so AssertionError happeneds here*****
517258-16/03/21 19:28:56.299 ERROR Executor: Exception in task 177.0 in stage 10.0 (TID 1662)
517259-java.lang.AssertionError: assertion failed
517260- at scala.Predef$.assert(Predef.scala:151)
517261- at org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$1$$anonfun$apply$1.apply(BlockInfoManager.scala:356)
517262- at org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$1$$anonfun$apply$1.apply(BlockInfoManager.scala:351)
517263- at scala.Option.foreach(Option.scala:257)
517264- at org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$1.apply(BlockInfoManager.scala:351)
517265- at org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$1.apply(BlockInfoManager.scala:350)
517266- at scala.collection.mutable.HashSet.foreach(HashSet.scala:78)
517267- at org.apache.spark.storage.BlockInfoManager.releaseAllLocksForTask(BlockInfoManager.scala:350)
517268- at org.apache.spark.storage.BlockManager.releaseAllLocksForTask(BlockManager.scala:626)
517269- at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:238)
{quote}

When memory for RDD storage is not sufficient and have to evict several partitions, this _AssertionError_ may happened. 
For the above example, this is because while running _Task 1662_, several partition (including rdd_3_183) need to be evicted. So _Task 1662_ acquired  read and write locks at first, then doing _dropBlock_ method in _MemoryStore.evictBlocksToFreeSpace_ and actually dropping _rdd_3_183_ from memory. The _newEffectiveStorageLevel.isValid_ is false, so we run into _BlockInfoManager.removeBlock_, but _writeLocksByTask_  is not update here.

Unfortunately, _Task 1681_ is already started and needed to reproduce rdd\_3\_183 to produce it's target rdd here , and this task acquired write lock of rdd\_3\_183. When _Task 1662_ call _releaseAllLocksForTask_ at last, this _AssertionError_ occurs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org