You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Wayne Adams <wm...@comcast.net> on 2014/06/04 19:20:47 UTC

Re: Spark streaming on load run - How to increase single node capacity?

Hi Rod:  
Not sure about the 2nd item on your list, but for the first one, try raising
the thread limit.  Your machine might be set to 1024 or some other low
number (ulimit -n).
-- Wayne




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-on-load-run-How-to-increase-single-node-capacity-tp6953p6955.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Spark streaming on load run - How to increase single node capacity?

Posted by RodrigoB <ro...@aspect.com>.

Hi Wayne,

Tnks for reply. I did raise the thread max before posting, based on your
previous comment on another post using ulimit -n 2048. That seemed to have
helped on the out of memory issue.

I'm curious if this is standard procedure for scaling a spark node's
resources vertically or is it just a quick workaround. I would expect the
Spark standalone master to have these settings exposed on some configuration
file.

The second item I'm referring to is the trickiest since only occurs (empty
data!) when I increase the number of worker threads Local[N]. I don't see a
real gain on increasing the number of threads, actually seems that the
performance degrades as it seems I get threads waiting for others to finish
to return processed data.

As a general statement we could say that for small sized RDDs a high number
of threads could be a problem. You agree?

tnks,
Rod



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-on-load-run-How-to-increase-single-node-capacity-tp6953p7096.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.