You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Liang Lee (JIRA)" <ji...@apache.org> on 2016/04/18 05:03:25 UTC

[jira] [Created] (SPARK-14695) Error occurs when using OFF_HEAP persistent level

Liang Lee created SPARK-14695:
---------------------------------

             Summary: Error occurs when using OFF_HEAP persistent level 
                 Key: SPARK-14695
                 URL: https://issues.apache.org/jira/browse/SPARK-14695
             Project: Spark
          Issue Type: Bug
          Components: Block Manager, Spark Core
    Affects Versions: 1.6.0
         Environment: Spark 1.6.0
Tachyon 0.8.2
Hadoop 2.6.0
            Reporter: Liang Lee


When running a PageRank job through  the default examples, e.g., the class  'org.apache.spark.examples.graphx.Analytics' in spark-examples-1.6.0-hadoop2.6.0.jar package, we got the following erors:
16/04/18 02:30:01 WARN scheduler.TaskSetManager: Lost task 9.0 in stage 6.0 (TID                                                                        53, R1S1): java.lang.IllegalArgumentException: requirement failed: sizeInBytes                                                                        was negative: -1
        at scala.Predef$.require(Predef.scala:233)
        at org.apache.spark.storage.BlockInfo.markReady(BlockInfo.scala:55)
        at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:822)
        at org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:                                                                       645)
        at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:15                                                                       3)
        at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:                                                                       38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scal                                                                       a:73)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scal                                                                       a:41)
        at org.apache.spark.scheduler.Task.run(Task.scala:89)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.                                                                       java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor                                                                       .java:615)
        at java.lang.Thread.run(Thread.java:745)


We use the following script to submit the job:
/Hadoop/spark-1.6.0-bin-hadoop2.6/bin/spark-submit --class org.apache.spark.examples.graphx.Analytics /Hadoop/spark-1.6.0-bin-hadoop2.6/lib/spark-examples-1.6.0-hadoop2.6.0.jar pagerank /data/soc-LiveJournal1.txt --output=/output/live-off.res --numEPart=10 --numIter=1 --edgeStorageLevel=OFF_HEAP --vertexStorageLevel=OFF_HEAP

When we set the storage level to MEMORY_ONLY or DISK_ONLY, there is no error and the job can finished correctly.
But when we set the storage level to OFF_HEAP, which means using Tachyon for the storage process, the error occurs.

The executors stack is like this, seems the write block to Tahcyon failed.
16/04/18 02:25:54 ERROR ExternalBlockStore: Error in putValues(rdd_20_1)
java.io.IOException: Fail to cache: null
	at tachyon.client.file.FileOutStream.handleCacheWriteException(FileOutStream.java:276)
	at tachyon.client.file.FileOutStream.close(FileOutStream.java:165)
	at org.apache.spark.storage.TachyonBlockManager.putValues(TachyonBlockManager.scala:126)
	at org.apache.spark.storage.ExternalBlockStore.putIntoExternalBlockStore(ExternalBlockStore.scala:79)
	at org.apache.spark.storage.ExternalBlockStore.putIterator(ExternalBlockStore.scala:67)
	at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:798)
	at org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:645)
	at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:153)
	at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
	at org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:88)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
	at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
	at org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:88)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
	at org.apache.spark.scheduler.Task.run(Task.scala:89)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.nio.channels.ClosedChannelException
	at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:99)
	at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:820)
	at tachyon.client.block.LocalBlockOutStream.flush(LocalBlockOutStream.java:108)
	at tachyon.client.block.LocalBlockOutStream.close(LocalBlockOutStream.java:92)
	at tachyon.client.file.FileOutStream.close(FileOutStream.java:160)
	... 31 more
16/04/18 02:25:54 ERROR Executor: Exception in task 1.0 in stage 10.0 (TID 142)
java.lang.IllegalArgumentException: requirement failed: sizeInBytes was negative: -1
	at scala.Predef$.require(Predef.scala:233)
	at org.apache.spark.storage.BlockInfo.markReady(BlockInfo.scala:55)
	at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:822)
	at org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:645)
	at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:153)
	at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
	at org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:88)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
	at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
	at org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:88)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
	at org.apache.spark.scheduler.Task.run(Task.scala:89)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
16/04/18 02:25:54 INFO CoarseGrainedExecutorBackend: Got assigned task 160
16/04/18 02:25:54 INFO Executor: Running task 1.1 in stage 10.0 (TID 160)
16/04/18 02:25:54 INFO CacheManager: Partition rdd_30_1 not found, computing it
16/04/18 02:25:54 ERROR ExternalBlockStore: Error in getValues(rdd_20_1)
java.io.IOException: tachyon.exception.TachyonException: FileId 1660944383 BlockIndex 0 is not a valid block.
	at tachyon.client.TachyonFS.getClientBlockInfo(TachyonFS.java:447)
	at tachyon.client.TachyonFile.getClientBlockInfo(TachyonFile.java:126)
	at tachyon.client.TachyonFile.getLocationHosts(TachyonFile.java:212)
	at org.apache.spark.storage.TachyonBlockManager.getValues(TachyonBlockManager.scala:152)
	at org.apache.spark.storage.ExternalBlockStore$$anonfun$getValues$1.apply(ExternalBlockStore.scala:147)
	at org.apache.spark.storage.ExternalBlockStore$$anonfun$getValues$1.apply(ExternalBlockStore.scala:147)
	at scala.Option.flatMap(Option.scala:170)
	at org.apache.spark.storage.ExternalBlockStore.getValues(ExternalBlockStore.scala:147)
	at org.apache.spark.storage.BlockManager.doGetLocal(BlockManager.scala:486)
	at org.apache.spark.storage.BlockManager.getLocal(BlockManager.scala:420)
	at org.apache.spark.storage.BlockManager.get(BlockManager.scala:625)
	at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:44)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
	at org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:88)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
	at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
	at org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:88)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
	at org.apache.spark.scheduler.Task.run(Task.scala:89)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
Caused by: tachyon.exception.TachyonException: FileId 1660944383 BlockIndex 0 is not a valid block.
	at tachyon.client.FileSystemMasterClient.getFileBlockInfo(FileSystemMasterClient.java:151)
	at tachyon.client.TachyonFS.getClientBlockInfo(TachyonFS.java:444)
	... 35 more
16/04/18 02:25:54 INFO CacheManager: Partition rdd_20_1 not found, computing it
16/04/18 02:25:54 INFO BlockManager: Found block rdd_3_1 locally
16/04/18 02:25:54 WARN BlockManager: Block rdd_20_1 already exists on this machine; not re-adding it


And when we check the Storeag page of job UI, we find some blocks are not cached:

RDD Name	Storage Level	Cached Partitions	Fraction Cached	Size in Memory	Size in ExternalBlockStore	Size on Disk
EdgeRDD 	ExternalBlockStore Serialized 1x Replicated 	18 	90% 	0.0 B 	753.2 MB 	0.0 B
VertexRDD 	ExternalBlockStore Serialized 1x Replicated 	20 	100% 	0.0 B 	191.1 MB 	0.0 B
VertexRDD, VertexRDD 	ExternalBlockStore Serialized 1x Replicated 	20 	100% 	0.0 B 	121.1 MB 	0.0 B
VertexRDD 	ExternalBlockStore Serialized 1x Replicated 	20 	100% 	0.0 B 	121.3 MB 	0.0 B
GraphLoader.edgeListFile - edges (/data/soc-LiveJournal1.txt) 	ExternalBlockStore Serialized 1x Replicated 	20 	100% 	0.0 B 	871.8 MB 	0.0 B
EdgeRDD 	ExternalBlockStore Serialized 1x Replicated 	18 	90% 	0.0 B 	1307.6 MB 	0.0 B

I have check the onfiguration many times. Can anyone give me some ideas about this issue. I has puzzled me for 2 weeks.
 Thanks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org