You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Khaled Ammar <kh...@gmail.com> on 2015/07/07 14:40:38 UTC

Question about master memory requirement and GraphX pagerank performance !

Hi all,

I am fairly new to spark and wonder if you can help me. I am exploring
GraphX/Spark by running the pagerank example on a medium size graph (12 GB)
using this command:

My cluster is 1+16 machines, the master has 15 GB memory and each worker
has 30 GB. The master has 2 cores and each worker has 4 cores.

/home/ubuntu/spark-1.3.0/bin/spark-submit --master spark://<Master IP>:7077
--class org.apache.spark.examples.graphx.Analytics
/home/ubuntu/spark-1.3.0/examples/target/scala-2.10/spark-examples-1.3.0-hadoop1.0.4.jar
pagerank /user/ubuntu/input/<dataset> --numEPart=64
--output=/user/ubuntu/spark/16_pagerank --numIter=30


I have two questions:

1- When I set "SPARK_EXECUTOR_MEMORY=25000M", I received errors because
master cannot allocate this memory since the launched task includes "-Xms
25000M". Based on my understanding, the master does not do any computation
and this executor memory is only required in the worker machines. Why the
application cannot start without allocating all required memory in the
master as well as in all workers. !

2- I changed the executor memory to 15 GB and the application worked fine.
However, it did not finish the thirty iterations after 7 hours. There is
one that was taking 4+ hours, and its input is 400+ GB. I must be doing
something wrong, any comment?

-- 
Thanks,
-Khaled Ammar
www.khaledammar.com