You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Josh Rosen (JIRA)" <ji...@apache.org> on 2016/01/24 00:59:39 UTC

[jira] [Resolved] (SPARK-9367) spark sql select * cause driver java.lang.OutOfMemoryError: Requested array size exceeds VM limit

     [ https://issues.apache.org/jira/browse/SPARK-9367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Josh Rosen resolved SPARK-9367.
-------------------------------
    Resolution: Invalid

It sounds like you're trying to collect too much data to the driver. Without more information, though, this JIRA is no longer actionable (it's really old), so I'm resolving this as "invalid" for now.

> spark sql select * cause driver java.lang.OutOfMemoryError: Requested array size exceeds VM limit
> -------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-9367
>                 URL: https://issues.apache.org/jira/browse/SPARK-9367
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.4.0
>            Reporter: Ricky Yang
>
> when I use following spark sql reading data from hive table;
> create table spark.tablsetest as select * from bi_dpa.dpa_ord_bill_tf order by member_id limit 20000000;
> and the driver startup parm as following:
> spark-sql  --driver-memory 48g  --executor-memory 24g --driver-java-options  -XX:PermSize=1024M -XX:MaxPermSize=2048M
> unfortunately,the driver exists following bugs:
> 15/07/27 10:22:43 ERROR ActorSystemImpl: Uncaught fatal error from thread [sparkDriver-akka.actor.default-dispatcher-20] shutting down ActorSystem [sparkDriver]
> java.lang.OutOfMemoryError: Requested array size exceeds VM limit
> 	at java.util.Arrays.copyOf(Arrays.java:2271)
> 	at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
> 	at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
> 	at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
> 	at java.io.ObjectOutputStream$BlockDataOutputStream.write(ObjectOutputStream.java:1852)
> 	at java.io.ObjectOutputStream.write(ObjectOutputStream.java:708)
> 	at org.apache.spark.util.Utils$$anon$2.write(Utils.scala:134)
> 	at com.esotericsoftware.kryo.io.Output.flush(Output.java:155)
> 	at com.esotericsoftware.kryo.io.Output.close(Output.java:165)
> 	at org.apache.spark.serializer.KryoSerializationStream.close(KryoSerializer.scala:162)
> 	at org.apache.spark.util.Utils$.serializeViaNestedStream(Utils.scala:139)
> 	at org.apache.spark.rdd.ParallelCollectionPartition$$anonfun$writeObject$1.apply$mcV$sp(ParallelCollectionRDD.scala:65)
> 	at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1239)
> 	at org.apache.spark.rdd.ParallelCollectionPartition.writeObject(ParallelCollectionRDD.scala:51)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:606)
> 	at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:988)
> 	at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1495)
> 	at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
> 	at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
> 	at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
> 	at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
> 	at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
> 	at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
> 	at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
> 	at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:44)
> 	at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:81)
> 	at org.apache.spark.scheduler.Task$.serializeWithDependencies(Task.scala:168)
> 	at org.apache.spark.scheduler.TaskSetManager.resourceOffer(TaskSetManager.scala:467)
> 	at org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$org$apache$spark$scheduler$TaskSchedulerImpl$$resourceOfferSingleTaskSet$1.apply$mcVI$sp(TaskSchedulerImpl.scala:231)
> 15/07/27 10:22:43 ERROR ErrorMonitor: Uncaught fatal error from thread [sparkDriver-akka.actor.default-dispatcher-20] shutting down ActorSystem [sparkDriver]
> java.lang.OutOfMemoryError: Requested array size exceeds VM limit
> 	at java.util.Arrays.copyOf(Arrays.java:2271)
> 	at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
> 	at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
> 	at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
> 	at java.io.ObjectOutputStream$BlockDataOutputStream.write(ObjectOutputStream.java:1852)
> 	at java.io.ObjectOutputStream.write(ObjectOutputStream.java:708)
> 	at org.apache.spark.util.Utils$$anon$2.write(Utils.scala:134)
> 	at com.esotericsoftware.kryo.io.Output.flush(Output.java:155)
> 	at com.esotericsoftware.kryo.io.Output.close(Output.java:165)
> 	at org.apache.spark.serializer.KryoSerializationStream.close(KryoSerializer.scala:162)
> 	at org.apache.spark.util.Utils$.serializeViaNestedStream(Utils.scala:139)
> 	at org.apache.spark.rdd.ParallelCollectionPartition$$anonfun$writeObject$1.apply$mcV$sp(ParallelCollectionRDD.scala:65)
> 	at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1239)
> 	at org.apache.spark.rdd.ParallelCollectionPartition.writeObject(ParallelCollectionRDD.scala:51)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:606)
> 	at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:988)
> 	at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1495)
> 	at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
> 	at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
> 	at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
> 	at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
> 	at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
> 	at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
> 	at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
> 	at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:44)
> 	at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:81)
> 	at org.apache.spark.scheduler.Task$.serializeWithDependencies(Task.scala:168)
> 	at org.apache.spark.scheduler.TaskSetManager.resourceOffer(TaskSetManager.scala:467)
> 	at org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$org$apache$spark$scheduler$TaskSchedulerImpl$$resourceOfferSingleTaskSet$1.apply$mcVI$sp(TaskSchedulerImpl.scala:231)
> 15/07/27 10:22:46 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
> 15/07/27 10:22:46 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
> 15/07/27 10:22:46 WARN AkkaRpcEndpointRef: Error sending message [message = RemoveBroadcast(2,true)] in 1 attempts
> akka.pattern.AskTimeoutException: Recipient[Actor[akka://sparkDriver/user/BlockManagerMaster#2011779764]] had already been terminated.
> 	at akka.pattern.AskableActorRef$.ask$extension(AskSupport.scala:132)
> 	at org.apache.spark.rpc.akka.AkkaRpcEndpointRef.ask(AkkaRpcEnv.scala:299)
> 	at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101)
> 	at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78)
> 	at org.apache.spark.storage.BlockManagerMaster.removeBroadcast(BlockManagerMaster.scala:127)
> 	at org.apache.spark.broadcast.TorrentBroadcast$.unpersist(TorrentBroadcast.scala:228)
> 	at org.apache.spark.broadcast.TorrentBroadcastFactory.unbroadcast(TorrentBroadcastFactory.scala:45)
> 	at org.apache.spark.broadcast.BroadcastManager.unbroadcast(BroadcastManager.scala:66)
> 	at org.apache.spark.ContextCleaner.doCleanupBroadcast(ContextCleaner.scala:214)
> 	at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$sp$2.apply(ContextCleaner.scala:170)
> 	at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$sp$2.apply(ContextCleaner.scala:161)
> 	at scala.Option.foreach(Option.scala:236)
> 	at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:161)
> 	at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1215)
> 	at org.apache.spark.ContextCleaner.org$apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:154)
> 	at org.apache.spark.ContextCleaner$$anon$3.run(ContextCleaner.scala:67)
> 15/07/27 10:22:46 INFO RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
> 15/07/27 10:22:49 WARN AkkaRpcEndpointRef: Error sending message [message = RemoveBroadcast(2,true)] in 2 attempts
> akka.pattern.AskTimeoutException: Recipient[Actor[akka://sparkDriver/user/BlockManagerMaster#2011779764]] had already been terminated.
> 	at akka.pattern.AskableActorRef$.ask$extension(AskSupport.scala:132)
> 	at org.apache.spark.rpc.akka.AkkaRpcEndpointRef.ask(AkkaRpcEnv.scala:299)
> 	at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101)
> 	at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78)
> 	at org.apache.spark.storage.BlockManagerMaster.removeBroadcast(BlockManagerMaster.scala:127)
> 	at org.apache.spark.broadcast.TorrentBroadcast$.unpersist(TorrentBroadcast.scala:228)
> 	at org.apache.spark.broadcast.TorrentBroadcastFactory.unbroadcast(TorrentBroadcastFactory.scala:45)
> 	at org.apache.spark.broadcast.BroadcastManager.unbroadcast(BroadcastManager.scala:66)
> 	at org.apache.spark.ContextCleaner.doCleanupBroadcast(ContextCleaner.scala:214)
> 	at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$sp$2.apply(ContextCleaner.scala:170)
> 	at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$sp$2.apply(ContextCleaner.scala:161)
> 	at scala.Option.foreach(Option.scala:236)
> 	at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:161)
> 	at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1215)
> 	at org.apache.spark.ContextCleaner.org$apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:154)
> 	at org.apache.spark.ContextCleaner$$anon$3.run(ContextCleaner.scala:67)
> 15/07/27 10:22:52 WARN AkkaRpcEndpointRef: Error sending message [message = RemoveBroadcast(2,true)] in 3 attempts
> akka.pattern.AskTimeoutException: Recipient[Actor[akka://sparkDriver/user/BlockManagerMaster#2011779764]] had already been terminated.
> 	at akka.pattern.AskableActorRef$.ask$extension(AskSupport.scala:132)
> 	at org.apache.spark.rpc.akka.AkkaRpcEndpointRef.ask(AkkaRpcEnv.scala:299)
> 	at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101)
> 	at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78)
> 	at org.apache.spark.storage.BlockManagerMaster.removeBroadcast(BlockManagerMaster.scala:127)
> 	at org.apache.spark.broadcast.TorrentBroadcast$.unpersist(TorrentBroadcast.scala:228)
> 	at org.apache.spark.broadcast.TorrentBroadcastFactory.unbroadcast(TorrentBroadcastFactory.scala:45)
> 	at org.apache.spark.broadcast.BroadcastManager.unbroadcast(BroadcastManager.scala:66)
> 	at org.apache.spark.ContextCleaner.doCleanupBroadcast(ContextCleaner.scala:214)
> 	at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$sp$2.apply(ContextCleaner.scala:170)
> 	at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$sp$2.apply(ContextCleaner.scala:161)
> 	at scala.Option.foreach(Option.scala:236)
> 	at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:161)
> 	at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1215)
> 	at org.apache.spark.ContextCleaner.org$apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:154)
> 	at org.apache.spark.ContextCleaner$$anon$3.run(ContextCleaner.scala:67)
> 15/07/27 10:22:55 ERROR ContextCleaner: Error cleaning broadcast 2
> org.apache.spark.SparkException: Error sending message [message = RemoveBroadcast(2,true)]
> 	at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:116)
> 	at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78)
> 	at org.apache.spark.storage.BlockManagerMaster.removeBroadcast(BlockManagerMaster.scala:127)
> 	at org.apache.spark.broadcast.TorrentBroadcast$.unpersist(TorrentBroadcast.scala:228)
> 	at org.apache.spark.broadcast.TorrentBroadcastFactory.unbroadcast(TorrentBroadcastFactory.scala:45)
> 	at org.apache.spark.broadcast.BroadcastManager.unbroadcast(BroadcastManager.scala:66)
> 	at org.apache.spark.ContextCleaner.doCleanupBroadcast(ContextCleaner.scala:214)
> 	at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$sp$2.apply(ContextCleaner.scala:170)
> 	at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$sp$2.apply(ContextCleaner.scala:161)
> 	at scala.Option.foreach(Option.scala:236)
> 	at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:161)
> 	at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1215)
> 	at org.apache.spark.ContextCleaner.org$apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:154)
> 	at org.apache.spark.ContextCleaner$$anon$3.run(ContextCleaner.scala:67)
> Caused by: akka.pattern.AskTimeoutException: Recipient[Actor[akka://sparkDriver/user/BlockManagerMaster#2011779764]] had already been terminated.
> 	at akka.pattern.AskableActorRef$.ask$extension(AskSupport.scala:132)
> 	at org.apache.spark.rpc.akka.AkkaRpcEndpointRef.ask(AkkaRpcEnv.scala:299)
> 	at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101)
> 	... 13 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org