You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zeppelin.apache.org by "Joshua Villanueva (Jira)" <ji...@apache.org> on 2019/11/13 06:44:00 UTC

[jira] [Created] (ZEPPELIN-4447) Zeppelin notebook fails to run spark jobs (org.apache.spark.shuffle.FetchFailedException * Failed to open file)

Joshua Villanueva created ZEPPELIN-4447:
-------------------------------------------

             Summary: Zeppelin notebook fails to run spark jobs  (org.apache.spark.shuffle.FetchFailedException * Failed to open file)
                 Key: ZEPPELIN-4447
                 URL: https://issues.apache.org/jira/browse/ZEPPELIN-4447
             Project: Zeppelin
          Issue Type: Bug
          Components: Kubernetes, spark
    Affects Versions: 0.7.3
         Environment: {code:java}
// Spark-submit config

--conf spark.kubernetes.authenticate.driver.serviceAccountName=*
--conf spark.executor.cores=5
--conf spark.executor.memory=5g
--conf spark.driver.memory=5g
--conf spark.kubernetes.driver.docker.image=*
--conf spark.kubernetes.executor.docker.image=*
--conf spark.local.dir=/tmp/spark-local
--conf spark.executor.instances=3    
--conf spark.dynamicAllocation.enabled=true
--conf spark.shuffle.service.enabled=true    
--conf spark.kubernetes.shuffle.labels="*"    
--conf spark.dynamicAllocation.maxExecutors=3   
--conf spark.dynamicAllocation.minExecutors=1    
--conf spark.kubernetes.shuffle.namespace=*    
--conf spark.kubernetes.docker.image.pullPolicy=IfNotPresent    
--conf spark.kubernetes.initcontainer.docker.image=kubespark/spark-init:v2.2.0-kubernetes-0.5.0    
--conf spark.kubernetes.resourceStagingServer.uri=http://*{code}
            Reporter: Joshua Villanueva


I'm running zeppelin on a Kubernetes cluster (v1.13.4) and the spark interpreter was running smoothly until I installed and used pandas and seaborn for pyspark (on the zeppelin image and spark executor image). The executor pods have 5 cores and 1 G memory each. 
{code:java}
// Sample zeppelin paragraph
%pyspark
import seaborn as sns
#Plot success rate by age bin
x = df_age.groupby("*").agg({"*":"mean"}).sort("*").toPandas()sns.barplot("*","avg(*)", data = x, color = "cadetblue")
{code}

I tried looking for the directory that is shown in the error and discovered that only 1/3 executor pods has the indicated directory. There are times when the paragraphs I made run perfectly. 

_Side note:_ _I tried increasing the executor memory to 5G but it didn't reflect with the spawned pods._
{panel:title=Error logs}
FetchFailed(BlockManagerId(1, 192.168.3.148, 7337, None), shuffleId=39, mapId=0, reduceId=100, message=

org.apache.spark.shuffle.FetchFailedException: Failure while fetching StreamChunkId\{streamId=1309473202020, chunkIndex=0}: java.lang.RuntimeException: Failed to open file: /tmp/spark-local/blockmgr-042e15ea-f4cd-4c8e-b5c1-0cc147f2c68f/0e/shuffle_39_0_0.index at

org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.getSortBasedShuffleBlockData(ExternalShuffleBlockResolver.java:249)

at org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.getBlockData(ExternalShuffleBlockResolver.java:174)

at org.apache.spark.network.shuffle.ExternalShuffleBlockHandler$1.next(ExternalShuffleBlockHandler.java:105)

at org.apache.spark.network.shuffle.ExternalShuffleBlockHandler$1.next(ExternalShuffleBlockHandler.java:95)

at org.apache.spark.network.server.OneForOneStreamManager.getChunk(OneForOneStreamManager.java:89)

at org.apache.spark.network.server.TransportRequestHandler.processFetchRequest(TransportRequestHandler.java:125)

at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:103)

at org.apache.spark.network.server.TransportChannelHandler.channelRead(TransportChannelHandler.java:118)

at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)

at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)

at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:336)

at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287)

at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)

at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)

at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:336)

at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)

at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)

at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)

at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:336)

at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:85)

at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)

at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)

at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:336)

at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1294)

at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)

at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)

at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:911)

at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)

at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:643)

at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:566)

at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:480)

at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:442)

at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)

at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)

at java.lang.Thread.run(Thread.java:748)

Caused by: java.util.concurrent.ExecutionException: java.io.FileNotFoundException: /tmp/spark-local/blockmgr-042e15ea-f4cd-4c8e-b5c1-0cc147f2c68f/0e/shuffle_39_0_0.index (No such file or directory)

at org.spark_project.guava.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306)

at org.spark_project.guava.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:293)

at org.spark_project.guava.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)

at org.spark_project.guava.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:135)

at org.spark_project.guava.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2410)

at org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2380)

at org.spark_project.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)

at org.spark_project.guava.cache.LocalCache$Segment.get(LocalCache.java:2257)

at org.spark_project.guava.cache.LocalCache.get(LocalCache.java:4000)

at org.spark_project.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004)

at org.spark_project.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)

at org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.getSortBasedShuffleBlockData(ExternalShuffleBlockResolver.java:240)

... 34 more

Caused by: java.io.FileNotFoundException: /tmp/spark-local/blockmgr-042e15ea-f4cd-4c8e-b5c1-0cc147f2c68f/0e/shuffle_39_0_0.index (No such file or directory)

at java.io.FileInputStream.open0(Native Method)

at java.io.FileInputStream.open(FileInputStream.java:195)

at java.io.FileInputStream.<init>(FileInputStream.java:138)

at org.apache.spark.network.shuffle.ShuffleIndexInformation.<init>(ShuffleIndexInformation.java:41)

at org.apache.spark.network.shuffle.ExternalShuffleBlockResolver$1.load(ExternalShuffleBlockResolver.java:111)

at org.apache.spark.network.shuffle.ExternalShuffleBlockResolver$1.load(ExternalShuffleBlockResolver.java:109)

at org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)

at org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)

... 40 more

 

at org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:442)

at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:418)

at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:59)

at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)

at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)

at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)

at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)

at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)

at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)

at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.agg_doAggregateWithKeys$(Unknown Source)

at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)

at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)

at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395)

at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:234)

at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:228)

at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)

at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)

at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)

at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)

at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)

at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)

at org.apache.spark.scheduler.Task.run(Task.scala:108)

at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

at java.lang.Thread.run(Thread.java:748)

Caused by: org.apache.spark.network.client.ChunkFetchFailureException: Failure while fetching StreamChunkId\{streamId=1309473202020, chunkIndex=0}: java.lang.RuntimeException: Failed to open file: /tmp/spark-local/blockmgr-042e15ea-f4cd-4c8e-b5c1-0cc147f2c68f/0e/shuffle_39_0_0.index

at org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.getSortBasedShuffleBlockData(ExternalShuffleBlockResolver.java:249)

at org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.getBlockData(ExternalShuffleBlockResolver.java:174)

at org.apache.spark.network.shuffle.ExternalShuffleBlockHandler$1.next(ExternalShuffleBlockHandler.java:105)

at org.apache.spark.network.shuffle.ExternalShuffleBlockHandler$1.next(ExternalShuffleBlockHandler.java:95)

at org.apache.spark.network.server.OneForOneStreamManager.getChunk(OneForOneStreamManager.java:89)

at org.apache.spark.network.server.TransportRequestHandler.processFetchRequest(TransportRequestHandler.java:125)

at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:103)

at org.apache.spark.network.server.TransportChannelHandler.channelRead(TransportChannelHandler.java:118)

at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)

at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)

at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:336)

at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287)

at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)

at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)

at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:336)

at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)

at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)

at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)

at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:336)

at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:85)

at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)

at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)

at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:336)

at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1294)

at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)

at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)

at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:911)

at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)

at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:643)

at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:566)

at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:480)

at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:442)

at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)

at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)

at java.lang.Thread.run(Thread.java:748)

Caused by: java.util.concurrent.ExecutionException: java.io.FileNotFoundException: /tmp/spark-local/blockmgr-042e15ea-f4cd-4c8e-b5c1-0cc147f2c68f/0e/shuffle_39_0_0.index (No such file or directory)

at org.spark_project.guava.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306)

at org.spark_project.guava.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:293)

at org.spark_project.guava.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)

at org.spark_project.guava.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:135)

at org.spark_project.guava.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2410)

at org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2380)

at org.spark_project.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)

at org.spark_project.guava.cache.LocalCache$Segment.get(LocalCache.java:2257)

at org.spark_project.guava.cache.LocalCache.get(LocalCache.java:4000)

at org.spark_project.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004)

at org.spark_project.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)

at org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.getSortBasedShuffleBlockData(ExternalShuffleBlockResolver.java:240)

... 34 more

Caused by: java.io.FileNotFoundException: /tmp/spark-local/blockmgr-042e15ea-f4cd-4c8e-b5c1-0cc147f2c68f/0e/shuffle_39_0_0.index (No such file or directory)

at java.io.FileInputStream.open0(Native Method)

at java.io.FileInputStream.open(FileInputStream.java:195)

at java.io.FileInputStream.<init>(FileInputStream.java:138)

at org.apache.spark.network.shuffle.ShuffleIndexInformation.<init>(ShuffleIndexInformation.java:41)

at org.apache.spark.network.shuffle.ExternalShuffleBlockResolver$1.load(ExternalShuffleBlockResolver.java:111)

at org.apache.spark.network.shuffle.ExternalShuffleBlockResolver$1.load(ExternalShuffleBlockResolver.java:109)

at org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)

at org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)

... 40 more

 

at org.apache.spark.network.client.TransportResponseHandler.handle(TransportResponseHandler.java:182)

at org.apache.spark.network.server.TransportChannelHandler.channelRead(TransportChannelHandler.java:120)

at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)

at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)

at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:336)

at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287)

at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)

at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)

at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:336)

at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)

at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)

at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)

at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:336)

at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:85)

at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)

at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)

at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:336)

at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1294)

at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)

at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)

at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:911)

at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)

at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:643)

at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:566)

at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:480)

at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:442)

at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)

at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)

... 1 more

 
{panel}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)