You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Ihor Bobak (JIRA)" <ji...@apache.org> on 2015/05/13 17:30:02 UTC

[jira] [Created] (SPARK-7603) Crash of thrift server when doing SQL without "limit"

Ihor Bobak created SPARK-7603:
---------------------------------

             Summary: Crash of thrift server when doing SQL without "limit"
                 Key: SPARK-7603
                 URL: https://issues.apache.org/jira/browse/SPARK-7603
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 1.3.1
         Environment: Hortonworks Sandbox 2.1  with Spark 1.3.1
            Reporter: Ihor Bobak


I have 2 tables in hive: one with 120 thousand records, another one is 5 times smaller. 

I'm running a standalone cluster on single VM, and the thrift server with 
./start-thriftserver.sh --conf spark.executor.memory=2048m  --conf spark.driver.memory=1024m
command. 

My spark-defaults.conf contains:
spark.master                     spark://sandbox.hortonworks.com:7077
spark.eventLog.enabled           true
spark.eventLog.dir               hdfs://sandbox.hortonworks.com:8020/user/pdi/spark/logs


So, when I am running SQL 

select <some fields from header>, <some fields from details>
from  
	vw_salesorderdetail as d 
	left join vw_salesorderheader as h on h.SalesOrderID = d.SalesOrderID limit 2000000000;

everything is fine, no matter that the limit is unreal (again: the resultset returned is just 120000 records).

But if I am running the same query without limit clause - I get hanging of execution - see here: http://postimg.org/image/fujdjd16f/42945a78/

and a lot of exceptions in the logs of thrift server - here you are:

15/05/13 17:59:27 INFO TaskSetManager: Starting task 158.0 in stage 48.0 (TID 953, sandbox.hortonworks.com, PROCESS_LOCAL, 1473 bytes)
15/05/13 18:00:01 INFO TaskSetManager: Finished task 150.0 in stage 48.0 (TID 945) in 36166 ms on sandbox.hortonworks.com (152/200)
15/05/13 18:00:02 ERROR Utils: Uncaught exception in thread Spark Context Cleaner
java.lang.OutOfMemoryError: GC overhead limit exceeded
	at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:147)
	at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply(ContextCleaner.scala:144)
	at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply(ContextCleaner.scala:144)
	at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1618)
	at org.apache.spark.ContextCleaner.org$apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:143)
	at org.apache.spark.ContextCleaner$$anon$3.run(ContextCleaner.scala:65)
Exception in thread "Spark Context Cleaner" 15/05/13 18:00:02 ERROR Utils: Uncaught exception in thread task-result-getter-1
java.lang.OutOfMemoryError: GC overhead limit exceeded
	at java.lang.String.<init>(String.java:315)
	at com.esotericsoftware.kryo.io.Input.readAscii(Input.java:562)
	at com.esotericsoftware.kryo.io.Input.readString(Input.java:436)
	at com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.read(DefaultSerializers.java:157)
	at com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.read(DefaultSerializers.java:146)
	at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:706)
	at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611)
	at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
	at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
	at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:338)
	at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:293)
	at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:651)
	at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:605)
	at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
	at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:651)
	at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:605)
	at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
	at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
	at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:338)
	at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:293)
	at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
	at org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:173)
	at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79)
	at org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:621)
	at org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:379)
	at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:82)
	at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:51)
	at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:51)
	at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1618)
	at org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:50)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
Exception in thread "task-result-getter-1" 15/05/13 18:00:04 INFO TaskSetManager: Starting task 159.0 in stage 48.0 (TID 954, sandbox.hortonworks.com, PROCESS_LOCAL, 1473 bytes)
java.lang.OutOfMemoryError: GC overhead limit exceeded
	at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:147)
	at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply(ContextCleaner.scala:144)
	at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply(ContextCleaner.scala:144)
	at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1618)
	at org.apache.spark.ContextCleaner.org$apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:143)
	at org.apache.spark.ContextCleaner$$anon$3.run(ContextCleaner.scala:65)
java.lang.OutOfMemoryError: GC overhead limit exceeded
	at java.lang.String.<init>(String.java:315)
	at com.esotericsoftware.kryo.io.Input.readAscii(Input.java:562)
	at com.esotericsoftware.kryo.io.Input.readString(Input.java:436)
	at com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.read(DefaultSerializers.java:157)
	at com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.read(DefaultSerializers.java:146)
	at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:706)
	at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611)
	at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
	at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
	at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:338)
	at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:293)
	at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:651)
	at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:605)
	at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
	at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:651)
	at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:605)
	at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
	at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
	at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:338)
	at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:293)
	at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
	at org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:173)
	at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79)
	at org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:621)
	at org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:379)
	at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:82)
	at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:51)
	at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:51)
	at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1618)
	at org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:50)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
15/05/13 18:00:05 INFO TaskSetManager: Finished task 154.0 in stage 48.0 (TID 949) in 40665 ms on sandbox.hortonworks.com (153/200)
15/05/13 18:00:20 ERROR Utils: Uncaught exception in thread task-result-getter-3
java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "task-result-getter-3" java.lang.OutOfMemoryError: GC overhead limit exceeded
15/05/13 18:00:28 ERROR Utils: Uncaught exception in thread task-result-getter-2
java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "task-result-getter-2" java.lang.OutOfMemoryError: GC overhead limit exceeded
15/05/13 18:00:29 INFO TaskSetManager: Starting task 160.0 in stage 48.0 (TID 955, sandbox.hortonworks.com, PROCESS_LOCAL, 1473 bytes)
15/05/13 18:00:31 ERROR ActorSystemImpl: exception on LARS’ timer thread
java.lang.OutOfMemoryError: GC overhead limit exceeded
	at akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:409)
	at akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:375)
	at java.lang.Thread.run(Thread.java:744)
15/05/13 18:00:31 INFO ActorSystemImpl: starting new LARS thread
15/05/13 18:00:31 ERROR ActorSystemImpl: Uncaught fatal error from thread [sparkDriver-akka.remote.default-remote-dispatcher-6] shutting down ActorSystem [sparkDriver]
java.lang.OutOfMemoryError: GC overhead limit exceeded
	at java.lang.Class.getDeclaredMethods0(Native Method)
	at java.lang.Class.privateGetDeclaredMethods(Class.java:2531)
	at java.lang.Class.getDeclaredMethod(Class.java:2002)
	at java.io.ObjectStreamClass.getPrivateMethod(ObjectStreamClass.java:1431)
	at java.io.ObjectStreamClass.access$1700(ObjectStreamClass.java:72)
	at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:494)
	at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:468)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.io.ObjectStreamClass.<init>(ObjectStreamClass.java:468)
	at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:365)
	at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:602)
	at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622)
	at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
	at akka.serialization.JavaSerializer$$anonfun$1.apply(Serializer.scala:136)
	at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
	at akka.serialization.JavaSerializer.fromBinary(Serializer.scala:136)
	at akka.serialization.Serialization$$anonfun$deserialize$1.apply(Serialization.scala:104)
15/05/13 18:00:31 ERROR ActorSystemImpl: Uncaught fatal error from thread [sparkDriver-scheduler-1] shutting down ActorSystem [sparkDriver]
java.lang.OutOfMemoryError: GC overhead limit exceeded
	at akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:409)
	at akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:375)
	at java.lang.Thread.run(Thread.java:744)
15/05/13 18:00:31 ERROR ActorSystemImpl: Uncaught fatal error from thread [sparkDriver-akka.remote.default-remote-dispatcher-5] shutting down ActorSystem [sparkDriver]
java.lang.OutOfMemoryError: GC overhead limit exceeded
	at java.lang.Class.getDeclaredMethods0(Native Method)
	at java.lang.Class.privateGetDeclaredMethods(Class.java:2531)
	at java.lang.Class.getDeclaredMethod(Class.java:2002)
	at java.io.ObjectStreamClass.getPrivateMethod(ObjectStreamClass.java:1431)
	at java.io.ObjectStreamClass.access$1700(ObjectStreamClass.java:72)
	at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:494)
	at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:468)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.io.ObjectStreamClass.<init>(ObjectStreamClass.java:468)
	at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:365)
	at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:602)
	at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622)
	at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
	at akka.serialization.JavaSerializer$$anonfun$1.apply(Serializer.scala:136)
	at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)




Feel free to contact me - I will send you full logs. 

and in the same time tons of logs of the thrift server. 






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org