You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by "lokesh.gidra" <lo...@gmail.com> on 2014/07/14 01:03:28 UTC

Ideal core count within a single JVM

Hello,

What would be an ideal core count to run a spark job in local mode to get
best utilization of CPU? Actually I have a 48-core machine but the
performance of local[48] is poor as compared to local[10].


Lokesh



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Ideal-core-count-within-a-single-JVM-tp9566.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Ideal core count within a single JVM

Posted by "lokesh.gidra" <lo...@gmail.com>.

It makes sense what you said. But, when I proportionately reduce the heap
size, then also the problem persists. For instance, if I use 160 GB heap for
48 cores, whereas 80 GB heap for 24 cores, then also with 24 cores the
performance is better (although worse than 160 GB with 24 cores) than
48-core case. It is only when I use 40 GB with 24 cores that I see 48-core
case performing better than 24-core case.

Does it mean that there is no relation between thread count and heap size?
If so, can you please suggest how can I calculate heap sizes for fair
comparisons?

My real trouble is when I compare performance of application when run with
(1) a single node of 48 cores and 160GB heap, and (2) 8 nodes of 6 cores and
20GB each. In this comparison (2) performs far better than (1), and I don't
understand the reason.


Lokesh



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Ideal-core-count-within-a-single-JVM-tp9566p9810.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Ideal core count within a single JVM

Posted by Matei Zaharia <ma...@gmail.com>.

BTW you can see the number of parallel tasks in the application UI (http://localhost:4040) or in the log messages (e.g. when it says progress: 17/20, that means there are 20 tasks total and 17 are done). Spark will try to use at least one task per core in local mode so there might be more of them here, but if your file is big it will also have at least one task per 32 MB block of the file.

Matei

On Jul 14, 2014, at 6:39 PM, Matei Zaharia <ma...@gmail.com> wrote:

> I see, so here might be the problem. With more cores, there's less memory available per core, and now many of your threads are doing external hashing (spilling data to disk), as evidenced by the calls to ExternalAppendOnlyMap.spill. Maybe with 10 threads, there was enough memory per task to do all its hashing there. It's true though that these threads appear to be CPU-bound, largely due to Java Serialization. You could get this to run quite a bit faster using Kryo. However that won't eliminate the issue of spilling here.
> 
> Matei
> 
> On Jul 14, 2014, at 1:02 PM, lokesh.gidra <lo...@gmail.com> wrote:
> 
>> I am only playing with 'N' in local[N]. I thought that by increasing N, Spark
>> will automatically use more parallel tasks. Isn't it so? Can you please tell
>> me how can I modify the number of parallel tasks?
>> 
>> For me, there are hardly any threads in BLOCKED state in jstack output. In
>> 'top' I see my application consuming all the 48 cores all the time with
>> N=48.
>> 
>> I am attaching two jstack outputs that I took will the application was
>> running.
>> 
>> 
>> Lokesh
>> 
>> lessoutput3.lessoutput3
>> <http://apache-spark-user-list.1001560.n3.nabble.com/file/n9640/lessoutput3.lessoutput3>  
>> lessoutput4.lessoutput4
>> <http://apache-spark-user-list.1001560.n3.nabble.com/file/n9640/lessoutput4.lessoutput4>  
>> 
>> 
>> 
>> 
>> --
>> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Ideal-core-count-within-a-single-JVM-tp9566p9640.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Re: Ideal core count within a single JVM

Posted by Matei Zaharia <ma...@gmail.com>.

I see, so here might be the problem. With more cores, there's less memory available per core, and now many of your threads are doing external hashing (spilling data to disk), as evidenced by the calls to ExternalAppendOnlyMap.spill. Maybe with 10 threads, there was enough memory per task to do all its hashing there. It's true though that these threads appear to be CPU-bound, largely due to Java Serialization. You could get this to run quite a bit faster using Kryo. However that won't eliminate the issue of spilling here.

Matei

On Jul 14, 2014, at 1:02 PM, lokesh.gidra <lo...@gmail.com> wrote:

> I am only playing with 'N' in local[N]. I thought that by increasing N, Spark
> will automatically use more parallel tasks. Isn't it so? Can you please tell
> me how can I modify the number of parallel tasks?
> 
> For me, there are hardly any threads in BLOCKED state in jstack output. In
> 'top' I see my application consuming all the 48 cores all the time with
> N=48.
> 
> I am attaching two jstack outputs that I took will the application was
> running.
> 
> 
> Lokesh
> 
> lessoutput3.lessoutput3
> <http://apache-spark-user-list.1001560.n3.nabble.com/file/n9640/lessoutput3.lessoutput3>  
> lessoutput4.lessoutput4
> <http://apache-spark-user-list.1001560.n3.nabble.com/file/n9640/lessoutput4.lessoutput4>  
> 
> 
> 
> 
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Ideal-core-count-within-a-single-JVM-tp9566p9640.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Ideal core count within a single JVM

Posted by "lokesh.gidra" <lo...@gmail.com>.

I am only playing with 'N' in local[N]. I thought that by increasing N, Spark
will automatically use more parallel tasks. Isn't it so? Can you please tell
me how can I modify the number of parallel tasks?

For me, there are hardly any threads in BLOCKED state in jstack output. In
'top' I see my application consuming all the 48 cores all the time with
N=48.

I am attaching two jstack outputs that I took will the application was
running.


Lokesh

lessoutput3.lessoutput3
<http://apache-spark-user-list.1001560.n3.nabble.com/file/n9640/lessoutput3.lessoutput3>  
lessoutput4.lessoutput4
<http://apache-spark-user-list.1001560.n3.nabble.com/file/n9640/lessoutput4.lessoutput4>  




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Ideal-core-count-within-a-single-JVM-tp9566p9640.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Ideal core count within a single JVM

Posted by Matei Zaharia <ma...@gmail.com>.

Are you increasing the number of parallel tasks with cores as well? With more tasks there will be more data communicated and hence more calls to these functions.

Unfortunately contention is kind of hard to measure, since often the result is that you see many cores idle as they're waiting on a lock. ObjectOutputStream should not lock anything, but if it's blocking on a FileOutputStream to write data, that could be a problem. Look for "BLOCKED" threads in a stack trace too (do jstack on your Java process and look at the TaskRunner threads).

Incidentally you can probably speed this up by using Kryo serialization instead of Java (see http://spark.apache.org/docs/latest/tuning.html). That might make it less CPU-bound and it would also create less IO.

Matei

On Jul 14, 2014, at 12:23 PM, lokesh.gidra <lo...@gmail.com> wrote:

> Thanks a lot for replying back.
> 
> Actually, I am running the SparkPageRank example with 160GB heap (I am sure
> the problem is not GC because the excess time is being spent in java code
> only).
> 
> What I have observed in Jprofiler and Oprofile outputs is that the amount of
> time spent in following 2 functions increases substantially with increasing
> N:
> 
> 1) java.io.ObjectOutputStream.writeObject0
> 2) scala.Tuple2.hashCode 
> 
> I don't think that Linux file system could be causing the issue as my
> machine has 256GB RAM, and I am using a tmpfs for java.io.tmpdir. So, I
> don't think there is much disk access involved, if that is what you meant.
> 
> Regards,
> Lokesh
> 
> 
> 
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Ideal-core-count-within-a-single-JVM-tp9566p9630.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Ideal core count within a single JVM

Posted by "lokesh.gidra" <lo...@gmail.com>.

Thanks a lot for replying back.

Actually, I am running the SparkPageRank example with 160GB heap (I am sure
the problem is not GC because the excess time is being spent in java code
only).

What I have observed in Jprofiler and Oprofile outputs is that the amount of
time spent in following 2 functions increases substantially with increasing
N:

1) java.io.ObjectOutputStream.writeObject0
2) scala.Tuple2.hashCode 

I don't think that Linux file system could be causing the issue as my
machine has 256GB RAM, and I am using a tmpfs for java.io.tmpdir. So, I
don't think there is much disk access involved, if that is what you meant.

Regards,
Lokesh



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Ideal-core-count-within-a-single-JVM-tp9566p9630.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Ideal core count within a single JVM

Posted by Matei Zaharia <ma...@gmail.com>.

Probably something like 8 is best on this kind of machine. What operations are you doing though? It's possible that something else is a contention point at 48 threads, e.g. a common one we've seen is the Linux file system.

Matei

On Jul 13, 2014, at 4:03 PM, lokesh.gidra <lo...@gmail.com> wrote:

> Hello,
> 
> What would be an ideal core count to run a spark job in local mode to get
> best utilization of CPU? Actually I have a 48-core machine but the
> performance of local[48] is poor as compared to local[10].
> 
> 
> Lokesh
> 
> 
> 
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Ideal-core-count-within-a-single-JVM-tp9566.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.