You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by MEETHU MATHEW <me...@yahoo.co.in> on 2014/07/10 14:50:22 UTC

Difference between collect() and take(n)

Hi all,

I want to know how collect() works, and how it is different from take().I am just reading a file of 330MB which has 43lakh rows with 13 columns and calling take(4300000) to save to a variable.But the same is not working with collect().So is there any difference in the operation of both.


Again,I wanted to set java heap size for my spark pgm. I set it using spark.executor.extraJavaOptions in spark-default-conf.sh. Now I want to set the same for the worker.Can I do that with SPARK_DAEMON_JAVA_OPTS?Is the following syntax correct?

SPARK_DAEMON_JAVA_OPTS="-XX:+UseCompressedOops -Xmx3g"


Thanks & Regards, 
Meethu M