You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Yonathan Perez <yo...@gmail.com> on 2014/03/02 07:29:51 UTC

OutOfMemoryError when loading input file

Hello,

I'm trying to run a simple test program that loads a large file (~12.4GB)
into memory of a single many-core machine.
The machine I'm using has more than enough memory (1TB RAM) and 64 cores
(of which I want to use 16 for worker threads).
Even though I set both the executor memory (spark.executor.memory) to 200GB
in SparkContext and set the JMV memory to 200GB (-Xmx200g) in spark-env.sh,
I keep getting errors when trying to load input:
"java.lang.OutOfMemoryError: GC overhead limit exceeded".
I believe that the memory configuration parameters I pass do not stick, as
I get the following message when running:
"14/03/01 22:09:31 INFO storage.MemoryStore: MemoryStore started with
capacity 883.2 MB."
Obviously I'm missing something when configuring Spark, but I can't figure
out what, and I'd appreciate your help.

The test program I'm running (not through shell, but as a standalone scala
app):

import org.apache.spark._
import org.apache.spark.rdd.RDD
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._

object LoadBenchmark {
  def main(args: Array[String]) {
  val conf = new SparkConf().setMaster("local[16]").setAppName("Load
Benchmark").set("spark.executor.memory", "200g")
    val sc = new SparkContext(conf)
    println("LOADING INPUT FILE")
    val edges =
sc.textFile("/lfs/madmax/0/yonathan/half_twitter_rv.txt").cache()
    val cnt = edges.count()
    println("edge count: "+ cnt)
  }
}

The contents of the spark-env.sh file:

#     Examples of app-wide options : -Dspark.serializer
SPARK_JAVA_OPTS+="-Xms200g -Xmx200g -XX:-UseGCOverheadLimit"
export SPARK_JAVA_OPTS
# If using the standalone deploy mode, you can also set variables for it
here:
# - SPARK_MASTER_IP, to bind the master to a different IP address or
hostname
# - SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports
# - SPARK_WORKER_CORES, to set the number of cores to use on this machine
SPARK_WORKER_CORES=16
export SPARK_WORKER_CORES
# - SPARK_WORKER_MEMORY, to set how much memory to use (e.g. 1000m, 2g)
SPARK_WORKER_MEMORY=200g
export SPARK_WORKER_MEMORY
# - SPARK_WORKER_PORT / SPARK_WORKER_WEBUI_PORT
# - SPARK_WORKER_INSTANCES, to set the number of worker processes per node
# - SPARK_WORKER_DIR, to set the working directory of worker processes

Thank you!

Re: OutOfMemoryError when loading input file

Posted by Yonathan Perez <yo...@gmail.com>.

Thanks for your answer yxzhao, but setting SPARK_MEM doesn't solve the
problem. 
I also understand that setting SPARK_MEM is the same as calling
SparkConf.set("spark.executor.memory",..) which I do.

Any additional advice would be highly appreciated.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/OutOfMemoryError-when-loading-input-file-tp2213p2246.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.