You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Gilberto Lira <gi...@scanboo.com.br> on 2014/09/18 20:48:03 UTC

Spark on EC2

Hello, I am trying to run a python script that makes use of the kmeans MLIB
and I'm not getting anywhere. I'm using an c3.xlarge instance as master,
and 10 c3.large instances as slaves. In the code I make a map of a 600MB
csv file in S3, where each row has 128 integer columns. The problem is that
around the TID7 my slave stops responding, and I can not finish my
processing. Could you help me with this problem? I sending my script
attached for review.

Thank you,
Gilberto

Re: Spark on EC2

Posted by Burak Yavuz <by...@stanford.edu>.
Hi Gilberto,

Could you please attach the driver logs as well, so that we can pinpoint what's going wrong? Could you also add the flag
`--driver-memory 4g` while submitting your application and try that as well?

Best,
Burak

----- Original Message -----
From: "Gilberto Lira" <gi...@scanboo.com.br>
To: user@spark.apache.org
Sent: Thursday, September 18, 2014 11:48:03 AM
Subject: Spark on EC2

Hello, I am trying to run a python script that makes use of the kmeans MLIB and I'm not getting anywhere. I'm using an c3.xlarge instance as master, and 10 c3.large instances as slaves. In the code I make a map of a 600MB csv file in S3, where each row has 128 integer columns. The problem is that around the TID7 my slave stops responding, and I can not finish my processing. Could you help me with this problem? I sending my script attached for review. 

Thank you, 
Gilberto 



---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org