You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by t_ras <ma...@netvision.net.il> on 2015/10/04 13:26:07 UTC
java.lang.OutOfMemoryError: GC overhead limit exceeded
I get java.lang.OutOfMemoryError: GC overhead limit exceeded when trying
coutn action on a file.
The file is a CSV file 217GB zise
Im using a 10 r3.8xlarge(ubuntu) machines cdh 5.3.6 and spark 1.2.0
configutation:
spark.app.id:local-1443956477103
spark.app.name:Spark shell
spark.cores.max:100
spark.driver.cores:24
spark.driver.extraLibraryPath:/opt/cloudera/parcels/CDH-5.3.6-1.cdh5.3.6.p0.11/lib/hadoop/lib/native
spark.driver.host:ip-172-31-34-242.us-west-2.compute.internal
spark.driver.maxResultSize:300g
spark.driver.port:55123
spark.eventLog.dir:hdfs://ip-172-31-34-242.us-west-2.compute.internal:8020/user/spark/applicationHistory
spark.eventLog.enabled:true
spark.executor.extraLibraryPath:/opt/cloudera/parcels/CDH-5.3.6-1.cdh5.3.6.p0.11/lib/hadoop/lib/native
spark.executor.id:driver spark.executor.memory:200g
spark.fileserver.uri:http://172.31.34.242:51424
spark.jars: spark.master:local[*]
spark.repl.class.uri:http://172.31.34.242:58244
spark.scheduler.mode:FIFO
spark.serializer:org.apache.spark.serializer.KryoSerializer
spark.storage.memoryFraction:0.9
spark.tachyonStore.folderName:spark-88bd9c44-d626-4ad2-8df3-f89df4cb30de
spark.yarn.historyServer.address:http://ip-172-31-34-242.us-west-2.compute.internal:18088
here is what I ran:
val testrdd =
sc.textFile("hdfs://ip-172-31-34-242.us-west-2.compute.internal:8020/user/jethro/tables/edw_fact_lsx_detail/edw_fact_lsx_detail.csv")
testrdd.persist(org.apache.spark.storage.StorageLevel.MEMORY_ONLY_SER)
testrdd.count()
If I dont force it in memeory it sorks fine, but considering the cluster Im
running on it should fit in memory properly.
Any ideas?
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-OutOfMemoryError-GC-overhead-limit-exceeded-tp24918.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org
Re: java.lang.OutOfMemoryError: GC overhead limit exceeded
Posted by Ted Yu <yu...@gmail.com>.
1.2.0 is quite old.
You may want to try 1.5.1 which was released in the past week.
Cheers
> On Oct 4, 2015, at 4:26 AM, t_ras <ma...@netvision.net.il> wrote:
>
> I get java.lang.OutOfMemoryError: GC overhead limit exceeded when trying
> coutn action on a file.
>
> The file is a CSV file 217GB zise
>
> Im using a 10 r3.8xlarge(ubuntu) machines cdh 5.3.6 and spark 1.2.0
>
> configutation:
>
> spark.app.id:local-1443956477103
>
> spark.app.name:Spark shell
>
> spark.cores.max:100
>
> spark.driver.cores:24
>
> spark.driver.extraLibraryPath:/opt/cloudera/parcels/CDH-5.3.6-1.cdh5.3.6.p0.11/lib/hadoop/lib/native
> spark.driver.host:ip-172-31-34-242.us-west-2.compute.internal
>
> spark.driver.maxResultSize:300g
>
> spark.driver.port:55123
>
> spark.eventLog.dir:hdfs://ip-172-31-34-242.us-west-2.compute.internal:8020/user/spark/applicationHistory
> spark.eventLog.enabled:true
>
> spark.executor.extraLibraryPath:/opt/cloudera/parcels/CDH-5.3.6-1.cdh5.3.6.p0.11/lib/hadoop/lib/native
>
> spark.executor.id:driver spark.executor.memory:200g
>
> spark.fileserver.uri:http://172.31.34.242:51424
>
> spark.jars: spark.master:local[*]
>
> spark.repl.class.uri:http://172.31.34.242:58244
>
> spark.scheduler.mode:FIFO
>
> spark.serializer:org.apache.spark.serializer.KryoSerializer
>
> spark.storage.memoryFraction:0.9
>
> spark.tachyonStore.folderName:spark-88bd9c44-d626-4ad2-8df3-f89df4cb30de
>
> spark.yarn.historyServer.address:http://ip-172-31-34-242.us-west-2.compute.internal:18088
>
> here is what I ran:
>
> val testrdd =
> sc.textFile("hdfs://ip-172-31-34-242.us-west-2.compute.internal:8020/user/jethro/tables/edw_fact_lsx_detail/edw_fact_lsx_detail.csv")
>
> testrdd.persist(org.apache.spark.storage.StorageLevel.MEMORY_ONLY_SER)
>
> testrdd.count()
>
> If I dont force it in memeory it sorks fine, but considering the cluster Im
> running on it should fit in memory properly.
>
> Any ideas?
>
>
>
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-OutOfMemoryError-GC-overhead-limit-exceeded-tp24918.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org