You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Bosung Seo <bo...@brightcloud.com> on 2015/07/17 18:24:08 UTC

exception raised during large spark job against cassandra ring

Hello,



I have been having trouble getting large Spark jobs to complete against my
Cassandra ring.



I’m finding that the CPU goes to 100% on one of the nodes, and then, after
many hours, the job fails.



Here are my Spark settings:

      .set(*"spark.cassandra.input.split.size_in_mb"*, *"128"*)
      .set(*"spark.cassandra.output.batch.size.rows"*, *"300"*)
      .set(*"spark.network.timeout"*, *"21600"*)
      .set(*"spark.akka.frameSize"*, *"150"*)
      .set(*"spark.executor.heartbeatInterval"*, *"60"*)
      .set(*"spark.akka.timeout"*, *"300"*)
      .set(*"spark.akka.threads"*, *"24"*)



This Jira posting seems to discuss a 2 GB limit, but I cannot tell from the
post what the suggested solution would be for my setup:

https://issues.apache.org/jira/browse/SPARK-6190



Here is the exception details:



Exception in thread "main" org.apache.spark.SparkException: Job aborted due
to stage failure: Task 465 in stage 2.0 failed 4 times, most recent
failure: Lost task 465.3 in stage 2.0 (TID 2855, 172.31.44.9):
java.lang.RuntimeException: java.lang.IllegalArgumentException: Size
exceeds Integer.MAX_VALUE

                at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:828)

                at
org.apache.spark.storage.DiskStore$$anonfun$getBytes$2.apply(DiskStore.scala:125)

                at
org.apache.spark.storage.DiskStore$$anonfun$getBytes$2.apply(DiskStore.scala:113)

                at
org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1285)

                at
org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:127)

                at
org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:134)

                at
org.apache.spark.storage.BlockManager.doGetLocal(BlockManager.scala:511)

                at
org.apache.spark.storage.BlockManager.getBlockData(BlockManager.scala:302)

                at
org.apache.spark.network.netty.NettyBlockRpcServer$$anonfun$2.apply(NettyBlockRpcServer.scala:57)

                at
org.apache.spark.network.netty.NettyBlockRpcServer$$anonfun$2.apply(NettyBlockRpcServer.scala:57)

                at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)

                at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)

                at
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)

                at
scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)

                at
scala.collection.TraversableLike$class.map(TraversableLike.scala:244)

                at
scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)

                at
org.apache.spark.network.netty.NettyBlockRpcServer.receive(NettyBlockRpcServer.scala:57)

                at
org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:114)

                at
org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:87)

                at
org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:101)

                at
org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)

                at
io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)

                at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)

                at
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)

                at
io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254)

                at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)

                at
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)

                at
io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)

                at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)

                at
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)

                at
io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:163)

                at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)

                at
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)

                at
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787)

                at
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130)

                at
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)

                at
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)

                at
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)

                at
io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)

                at
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)

                at java.lang.Thread.run(Thread.java:744)



                at
org.apache.spark.network.client.TransportResponseHandler.handle(TransportResponseHandler.java:162)

                at
org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:103)

                at
org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)

                at
io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)

                at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)

                at
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)

                at
io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254)

                at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)

                at
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)

                at
io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)

                at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)

                at
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)

                at
io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:163)

                at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)

                at
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)

                at
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787)

                at
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130)

                at
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)

                at
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)

                at
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)

                at
io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)

                at
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)

                at java.lang.Thread.run(Thread.java:744)



Driver stacktrace:

                at org.apache.spark.scheduler.DAGScheduler.org
<http://org.apache.spark.scheduler.dagscheduler.org/>
$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1273)

                at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1264)

                at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1263)

                at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)

                at
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)

                at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1263)

                at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)

                at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)

                at scala.Option.foreach(Option.scala:236)

                at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:730)

                at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1457)

                at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1418)

                at
org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)



Thanks in advance for any help you can provide…

Bosung