You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by io...@barclays.com on 2013/11/18 14:12:31 UTC

How to efficiently manage resources across a cluster and avoid GC overhead exceeded errors?

Hi,

I have a cluster of 20 servers, each having 24 cores and 30GB of RAM allocated to Spark. Spark runs in a STANDALONE mode.
I am trying to load some 200+GB files and cache the rows using ".cache()".

What I would like to do is the following: (ATM from the scala console)
-Evenly load the files across the 20 servers (preferably using all 20*24 cores for the load)
-Verify that data are loaded as NODE_LOCAL
Looking into the :4040 console, I see in some runs a lot of NODE_LOCAL but in others a lot of ANY. Is there a way to identify what is that TID doing in ANY

If I allocate less than ~double the memory I need, I get an OutOfMemory error.

If I use the textFile (int) parameter,

*         i.e. "sc.textFile("hdfs://...",20)
Then the error goes away.

On the other hand, if I allocate enough memory, I can see from the admin console that some of my workers have too much load and some other less than half. I understand that I could use a partitioner to balance my data but I wouldn't expect an OOME if nodes are significantly under-used. Am I missing something?

Thanks,

Ioannis Deligiannis


_______________________________________________

This message is for information purposes only, it is not a recommendation, advice, offer or solicitation to buy or sell a product or service nor an official confirmation of any transaction. It is directed at persons who are professionals and is not intended for retail customer use. Intended for recipient only. This message is subject to the terms at: www.barclays.com/emaildisclaimer.

For important disclosures, please see: www.barclays.com/salesandtradingdisclaimer regarding market commentary from Barclays Sales and/or Trading, who are active market participants; and in respect of Barclays Research, including disclosures relating to specific issuers, please see http://publicresearch.barclays.com.

_______________________________________________