You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Koert Kuipers <ko...@tresata.com> on 2014/03/10 23:18:41 UTC

computation slows down 10x because of cached RDDs

hello all,
i am observing a strange result. i have a computation that i run on a
cached RDD in spark-standalone. it typically takes about 4 seconds.

but when other RDDs that are not relevant to the computation at hand are
cached in memory (in same spark context), the computation takes 40 seconds
or more.

the problem seems to be GC time, which goes from milliseconds to tens of
seconds.

note that my issue is not that memory is full. i have cached about 14G in
RDDs with 66G available across workers for the application. also my
computation did not push any cached RDD out of memory.

any ideas?

Re: computation slows down 10x because of cached RDDs

Posted by Andrew Ash <an...@andrewash.com>.
Note that calling System.gc() is just a suggestion to the JVM that it
should run a garbage collection and doesn't force it right then 100% of the
time.

http://stackoverflow.com/questions/1481178/forcing-garbage-collection-in-java


On Tue, Mar 11, 2014 at 12:17 PM, Matei Zaharia <ma...@gmail.com>wrote:

> Right, that’s it. I think what happened is the following: all the nodes
> generated some garbage that put them very close to the threshold for a full
> GC in the first few runs of the program (when you cached the RDDs), but on
> the subsequent queries, only a few nodes are hitting full GC per query, so
> every query sees a slowdown but the problem persists for a whille. You can
> try manually forcing a GC on the nodes like this after you do your loading:
>
> sc.parallelize(1 to numNodes, numNodes).foreach(x => System.gc())
>
> Where numNodes is your number of nodes. (Actually it’s also okay to just
> make this higher, System.gc() returns fast when there’s no GC to run.)
>
> Matei
>
> On Mar 11, 2014, at 7:12 AM, Koert Kuipers <ko...@tresata.com> wrote:
>
> hey matei,
> most tasks have GC times of 200ms or less, and then a few tasks take many
> seconds. example GC activity for a slow one:
>
> [GC [PSYoungGen: 1051814K->262624K(1398144K)]
> 3789259K->3524429K(5592448K), 0.0986800 secs] [Times: user=1.53 sys=0.01,
> real=0.10 secs]
> [GC [PSYoungGen: 786935K->524512K(1398144K)] 4048741K->4048762K(5592448K),
> 0.1132490 secs] [Times: user=1.70 sys=0.01, real=0.11 secs]
> [Full GC [PSYoungGen: 524512K->0K(1398144K)] [PSOldGen:
> 3524250K->2207344K(4194304K)] 4048762K->2207344K(5592448K) [PSPermGen:
> 56545K->54639K(83968K)], 7.7059350 secs] [Times:\
>  user=7.71 sys=0.00, real=7.70 secs]
>
>
> so looks like i am hit by stop-the-world gc?
>
>
> On Mon, Mar 10, 2014 at 7:00 PM, Koert Kuipers <ko...@tresata.com> wrote:
>
>> hey matei,
>> it happens repeatedly.
>>
>> we are currently runnning on java 6 with spark 0.9.
>>
>> i will add -XX:+PrintGCDetails and collect details, and also look into
>> java 7 G1. thanks
>>
>>
>>
>>
>>
>>
>> On Mon, Mar 10, 2014 at 6:27 PM, Matei Zaharia <ma...@gmail.com>wrote:
>>
>>> Does this happen repeatedly if you keep running the computation, or just
>>> the first time? It may take time to move these Java objects to the old
>>> generation the first time you run queries, which could lead to a GC pause
>>> that also slows down the small queries.
>>>
>>> If you can run with -XX:+PrintGCDetails in your Java options, it would
>>> also be good to see what percent of each GC generation is used.
>>>
>>> The concurrent mark-and-sweep GC -XX:+UseConcMarkSweepGC or the G1 GC in
>>> Java 7 (-XX:+UseG1GC) might also avoid these pauses by GCing concurrently
>>> with your application threads.
>>>
>>> Matei
>>>
>>> On Mar 10, 2014, at 3:18 PM, Koert Kuipers <ko...@tresata.com> wrote:
>>>
>>> hello all,
>>> i am observing a strange result. i have a computation that i run on a
>>> cached RDD in spark-standalone. it typically takes about 4 seconds.
>>>
>>> but when other RDDs that are not relevant to the computation at hand are
>>> cached in memory (in same spark context), the computation takes 40 seconds
>>> or more.
>>>
>>> the problem seems to be GC time, which goes from milliseconds to tens of
>>> seconds.
>>>
>>> note that my issue is not that memory is full. i have cached about 14G
>>> in RDDs with 66G available across workers for the application. also my
>>> computation did not push any cached RDD out of memory.
>>>
>>> any ideas?
>>>
>>>
>>>
>>
>
>

Re: computation slows down 10x because of cached RDDs

Posted by Koert Kuipers <ko...@tresata.com>.
hey matei,
ok when i switch to java 7 with G1 the GC time for all the "quick" tasks
goes from 150ms to 10ms, but the slow ones stay just as slow. all i did was
add -XX:+UseG1GC so maybe thats wrong, i still have to read up on G1.

an example of GC in a slow task is below.
best, koert


[GC pause (young), 0.0100070 secs]
   [Parallel Time: 7.3 ms, GC Workers: 18]
      [GC Worker Start (ms): Min: 2889329.4, Avg: 2889329.8, Max:
2889330.0, Diff: 0.6]
      [Ext Root Scanning (ms): Min: 2.0, Avg: 2.5, Max: 3.2, Diff: 1.2,
Sum: 44.2]
      [Update RS (ms): Min: 0.0, Avg: 0.3, Max: 3.5, Diff: 3.5, Sum: 5.8]
         [Processed Buffers: Min: 0, Avg: 3.3, Max: 14, Diff: 14, Sum: 59]
      [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
      [Object Copy (ms): Min: 0.2, Avg: 0.7, Max: 1.0, Diff: 0.8, Sum: 12.0]
      [Termination (ms): Min: 0.0, Avg: 2.4, Max: 2.6, Diff: 2.6, Sum: 43.8]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.0, Sum:
0.5]
      [GC Worker Total (ms): Min: 5.7, Avg: 5.9, Max: 6.3, Diff: 0.6, Sum:
106.4]
      [GC Worker End (ms): Min: 2889335.7, Avg: 2889335.7, Max: 2889335.7,
Diff: 0.0]
   [Code Root Fixup: 0.0 ms]
   [Clear CT: 1.1 ms]
   [Other: 1.6 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 1.0 ms]
      [Ref Enq: 0.0 ms]
      [Free CSet: 0.3 ms]
   [Eden: 20.0M(752.0M)->0.0B(304.0M) Survivors: 2048.0K->2048.0K Heap:
5917.7M(6144.0M)->5898.8M(6144.0M)]
 [Times: user=0.12 sys=0.00, real=0.01 secs]
[Full GC 5898M->2057M(6144M), 5.7978580 secs]
   [Eden: 2048.0K(304.0M)->0.0B(1064.0M) Survivors: 2048.0K->0.0B Heap:
5898.9M(6144.0M)->2057.5M(6144.0M)]
 [Times: user=9.24 sys=0.00, real=5.80 secs]



On Tue, Mar 11, 2014 at 4:35 PM, Matei Zaharia <ma...@gmail.com>wrote:

> Yeah, System.gc() is a suggestion but in practice it does invoke full GCs
> on the Sun JVM.
>
> Matei
>
> On Mar 11, 2014, at 12:35 PM, Koert Kuipers <ko...@tresata.com> wrote:
>
> hey matei,
> ha i will definitely that one! looks like a total hack... i might just
> schedule it after the precaching of rdds defensively.
>
> also trying java 7 with g1
>
>
> On Tue, Mar 11, 2014 at 3:17 PM, Matei Zaharia <ma...@gmail.com>wrote:
>
>> Right, that's it. I think what happened is the following: all the nodes
>> generated some garbage that put them very close to the threshold for a full
>> GC in the first few runs of the program (when you cached the RDDs), but on
>> the subsequent queries, only a few nodes are hitting full GC per query, so
>> every query sees a slowdown but the problem persists for a whille. You can
>> try manually forcing a GC on the nodes like this after you do your loading:
>>
>> sc.parallelize(1 to numNodes, numNodes).foreach(x => System.gc())
>>
>> Where numNodes is your number of nodes. (Actually it's also okay to just
>> make this higher, System.gc() returns fast when there's no GC to run.)
>>
>> Matei
>>
>> On Mar 11, 2014, at 7:12 AM, Koert Kuipers <ko...@tresata.com> wrote:
>>
>> hey matei,
>> most tasks have GC times of 200ms or less, and then a few tasks take many
>> seconds. example GC activity for a slow one:
>>
>> [GC [PSYoungGen: 1051814K->262624K(1398144K)]
>> 3789259K->3524429K(5592448K), 0.0986800 secs] [Times: user=1.53 sys=0.01,
>> real=0.10 secs]
>> [GC [PSYoungGen: 786935K->524512K(1398144K)]
>> 4048741K->4048762K(5592448K), 0.1132490 secs] [Times: user=1.70 sys=0.01,
>> real=0.11 secs]
>> [Full GC [PSYoungGen: 524512K->0K(1398144K)] [PSOldGen:
>> 3524250K->2207344K(4194304K)] 4048762K->2207344K(5592448K) [PSPermGen:
>> 56545K->54639K(83968K)], 7.7059350 secs] [Times:\
>>  user=7.71 sys=0.00, real=7.70 secs]
>>
>>
>> so looks like i am hit by stop-the-world gc?
>>
>>
>> On Mon, Mar 10, 2014 at 7:00 PM, Koert Kuipers <ko...@tresata.com> wrote:
>>
>>> hey matei,
>>> it happens repeatedly.
>>>
>>> we are currently runnning on java 6 with spark 0.9.
>>>
>>> i will add -XX:+PrintGCDetails and collect details, and also look into
>>> java 7 G1. thanks
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Mar 10, 2014 at 6:27 PM, Matei Zaharia <ma...@gmail.com>wrote:
>>>
>>>> Does this happen repeatedly if you keep running the computation, or
>>>> just the first time? It may take time to move these Java objects to the old
>>>> generation the first time you run queries, which could lead to a GC pause
>>>> that also slows down the small queries.
>>>>
>>>> If you can run with -XX:+PrintGCDetails in your Java options, it would
>>>> also be good to see what percent of each GC generation is used.
>>>>
>>>> The concurrent mark-and-sweep GC -XX:+UseConcMarkSweepGC or the G1 GC
>>>> in Java 7 (-XX:+UseG1GC) might also avoid these pauses by GCing
>>>> concurrently with your application threads.
>>>>
>>>> Matei
>>>>
>>>> On Mar 10, 2014, at 3:18 PM, Koert Kuipers <ko...@tresata.com> wrote:
>>>>
>>>> hello all,
>>>> i am observing a strange result. i have a computation that i run on a
>>>> cached RDD in spark-standalone. it typically takes about 4 seconds.
>>>>
>>>> but when other RDDs that are not relevant to the computation at hand
>>>> are cached in memory (in same spark context), the computation takes 40
>>>> seconds or more.
>>>>
>>>> the problem seems to be GC time, which goes from milliseconds to tens
>>>> of seconds.
>>>>
>>>> note that my issue is not that memory is full. i have cached about 14G
>>>> in RDDs with 66G available across workers for the application. also my
>>>> computation did not push any cached RDD out of memory.
>>>>
>>>> any ideas?
>>>>
>>>>
>>>>
>>>
>>
>>
>
>

Re: computation slows down 10x because of cached RDDs

Posted by Matei Zaharia <ma...@gmail.com>.
Yeah, System.gc() is a suggestion but in practice it does invoke full GCs on the Sun JVM.

Matei

On Mar 11, 2014, at 12:35 PM, Koert Kuipers <ko...@tresata.com> wrote:

> hey matei,
> ha i will definitely that one! looks like a total hack... i might just schedule it after the precaching of rdds defensively.
> 
> also trying java 7 with g1
> 
> 
> On Tue, Mar 11, 2014 at 3:17 PM, Matei Zaharia <ma...@gmail.com> wrote:
> Right, that’s it. I think what happened is the following: all the nodes generated some garbage that put them very close to the threshold for a full GC in the first few runs of the program (when you cached the RDDs), but on the subsequent queries, only a few nodes are hitting full GC per query, so every query sees a slowdown but the problem persists for a whille. You can try manually forcing a GC on the nodes like this after you do your loading:
> 
> sc.parallelize(1 to numNodes, numNodes).foreach(x => System.gc())
> 
> Where numNodes is your number of nodes. (Actually it’s also okay to just make this higher, System.gc() returns fast when there’s no GC to run.)
> 
> Matei
> 
> On Mar 11, 2014, at 7:12 AM, Koert Kuipers <ko...@tresata.com> wrote:
> 
>> hey matei,
>> most tasks have GC times of 200ms or less, and then a few tasks take many seconds. example GC activity for a slow one:
>> 
>> [GC [PSYoungGen: 1051814K->262624K(1398144K)] 3789259K->3524429K(5592448K), 0.0986800 secs] [Times: user=1.53 sys=0.01, real=0.10 secs]
>> [GC [PSYoungGen: 786935K->524512K(1398144K)] 4048741K->4048762K(5592448K), 0.1132490 secs] [Times: user=1.70 sys=0.01, real=0.11 secs]
>> [Full GC [PSYoungGen: 524512K->0K(1398144K)] [PSOldGen: 3524250K->2207344K(4194304K)] 4048762K->2207344K(5592448K) [PSPermGen: 56545K->54639K(83968K)], 7.7059350 secs] [Times:\
>>  user=7.71 sys=0.00, real=7.70 secs]
>> 
>> 
>> so looks like i am hit by stop-the-world gc?
>> 
>> 
>> On Mon, Mar 10, 2014 at 7:00 PM, Koert Kuipers <ko...@tresata.com> wrote:
>> hey matei,
>> it happens repeatedly.
>> 
>> we are currently runnning on java 6 with spark 0.9.
>> 
>> i will add -XX:+PrintGCDetails and collect details, and also look into java 7 G1. thanks
>> 
>> 
>> 
>> 
>> 
>> 
>> On Mon, Mar 10, 2014 at 6:27 PM, Matei Zaharia <ma...@gmail.com> wrote:
>> Does this happen repeatedly if you keep running the computation, or just the first time? It may take time to move these Java objects to the old generation the first time you run queries, which could lead to a GC pause that also slows down the small queries.
>> 
>> If you can run with -XX:+PrintGCDetails in your Java options, it would also be good to see what percent of each GC generation is used.
>> 
>> The concurrent mark-and-sweep GC -XX:+UseConcMarkSweepGC or the G1 GC in Java 7 (-XX:+UseG1GC) might also avoid these pauses by GCing concurrently with your application threads.
>> 
>> Matei
>> 
>> On Mar 10, 2014, at 3:18 PM, Koert Kuipers <ko...@tresata.com> wrote:
>> 
>>> hello all,
>>> i am observing a strange result. i have a computation that i run on a cached RDD in spark-standalone. it typically takes about 4 seconds. 
>>> 
>>> but when other RDDs that are not relevant to the computation at hand are cached in memory (in same spark context), the computation takes 40 seconds or more.
>>> 
>>> the problem seems to be GC time, which goes from milliseconds to tens of seconds.
>>> 
>>> note that my issue is not that memory is full. i have cached about 14G in RDDs with 66G available across workers for the application. also my computation did not push any cached RDD out of memory.
>>> 
>>> any ideas?
>> 
>> 
>> 
> 
> 


Re: computation slows down 10x because of cached RDDs

Posted by Koert Kuipers <ko...@tresata.com>.
hey matei,
ha i will definitely that one! looks like a total hack... i might just
schedule it after the precaching of rdds defensively.

also trying java 7 with g1


On Tue, Mar 11, 2014 at 3:17 PM, Matei Zaharia <ma...@gmail.com>wrote:

> Right, that's it. I think what happened is the following: all the nodes
> generated some garbage that put them very close to the threshold for a full
> GC in the first few runs of the program (when you cached the RDDs), but on
> the subsequent queries, only a few nodes are hitting full GC per query, so
> every query sees a slowdown but the problem persists for a whille. You can
> try manually forcing a GC on the nodes like this after you do your loading:
>
> sc.parallelize(1 to numNodes, numNodes).foreach(x => System.gc())
>
> Where numNodes is your number of nodes. (Actually it's also okay to just
> make this higher, System.gc() returns fast when there's no GC to run.)
>
> Matei
>
> On Mar 11, 2014, at 7:12 AM, Koert Kuipers <ko...@tresata.com> wrote:
>
> hey matei,
> most tasks have GC times of 200ms or less, and then a few tasks take many
> seconds. example GC activity for a slow one:
>
> [GC [PSYoungGen: 1051814K->262624K(1398144K)]
> 3789259K->3524429K(5592448K), 0.0986800 secs] [Times: user=1.53 sys=0.01,
> real=0.10 secs]
> [GC [PSYoungGen: 786935K->524512K(1398144K)] 4048741K->4048762K(5592448K),
> 0.1132490 secs] [Times: user=1.70 sys=0.01, real=0.11 secs]
> [Full GC [PSYoungGen: 524512K->0K(1398144K)] [PSOldGen:
> 3524250K->2207344K(4194304K)] 4048762K->2207344K(5592448K) [PSPermGen:
> 56545K->54639K(83968K)], 7.7059350 secs] [Times:\
>  user=7.71 sys=0.00, real=7.70 secs]
>
>
> so looks like i am hit by stop-the-world gc?
>
>
> On Mon, Mar 10, 2014 at 7:00 PM, Koert Kuipers <ko...@tresata.com> wrote:
>
>> hey matei,
>> it happens repeatedly.
>>
>> we are currently runnning on java 6 with spark 0.9.
>>
>> i will add -XX:+PrintGCDetails and collect details, and also look into
>> java 7 G1. thanks
>>
>>
>>
>>
>>
>>
>> On Mon, Mar 10, 2014 at 6:27 PM, Matei Zaharia <ma...@gmail.com>wrote:
>>
>>> Does this happen repeatedly if you keep running the computation, or just
>>> the first time? It may take time to move these Java objects to the old
>>> generation the first time you run queries, which could lead to a GC pause
>>> that also slows down the small queries.
>>>
>>> If you can run with -XX:+PrintGCDetails in your Java options, it would
>>> also be good to see what percent of each GC generation is used.
>>>
>>> The concurrent mark-and-sweep GC -XX:+UseConcMarkSweepGC or the G1 GC in
>>> Java 7 (-XX:+UseG1GC) might also avoid these pauses by GCing concurrently
>>> with your application threads.
>>>
>>> Matei
>>>
>>> On Mar 10, 2014, at 3:18 PM, Koert Kuipers <ko...@tresata.com> wrote:
>>>
>>> hello all,
>>> i am observing a strange result. i have a computation that i run on a
>>> cached RDD in spark-standalone. it typically takes about 4 seconds.
>>>
>>> but when other RDDs that are not relevant to the computation at hand are
>>> cached in memory (in same spark context), the computation takes 40 seconds
>>> or more.
>>>
>>> the problem seems to be GC time, which goes from milliseconds to tens of
>>> seconds.
>>>
>>> note that my issue is not that memory is full. i have cached about 14G
>>> in RDDs with 66G available across workers for the application. also my
>>> computation did not push any cached RDD out of memory.
>>>
>>> any ideas?
>>>
>>>
>>>
>>
>
>

Re: computation slows down 10x because of cached RDDs

Posted by Matei Zaharia <ma...@gmail.com>.
Right, that’s it. I think what happened is the following: all the nodes generated some garbage that put them very close to the threshold for a full GC in the first few runs of the program (when you cached the RDDs), but on the subsequent queries, only a few nodes are hitting full GC per query, so every query sees a slowdown but the problem persists for a whille. You can try manually forcing a GC on the nodes like this after you do your loading:

sc.parallelize(1 to numNodes, numNodes).foreach(x => System.gc())

Where numNodes is your number of nodes. (Actually it’s also okay to just make this higher, System.gc() returns fast when there’s no GC to run.)

Matei

On Mar 11, 2014, at 7:12 AM, Koert Kuipers <ko...@tresata.com> wrote:

> hey matei,
> most tasks have GC times of 200ms or less, and then a few tasks take many seconds. example GC activity for a slow one:
> 
> [GC [PSYoungGen: 1051814K->262624K(1398144K)] 3789259K->3524429K(5592448K), 0.0986800 secs] [Times: user=1.53 sys=0.01, real=0.10 secs]
> [GC [PSYoungGen: 786935K->524512K(1398144K)] 4048741K->4048762K(5592448K), 0.1132490 secs] [Times: user=1.70 sys=0.01, real=0.11 secs]
> [Full GC [PSYoungGen: 524512K->0K(1398144K)] [PSOldGen: 3524250K->2207344K(4194304K)] 4048762K->2207344K(5592448K) [PSPermGen: 56545K->54639K(83968K)], 7.7059350 secs] [Times:\
>  user=7.71 sys=0.00, real=7.70 secs]
> 
> 
> so looks like i am hit by stop-the-world gc?
> 
> 
> On Mon, Mar 10, 2014 at 7:00 PM, Koert Kuipers <ko...@tresata.com> wrote:
> hey matei,
> it happens repeatedly.
> 
> we are currently runnning on java 6 with spark 0.9.
> 
> i will add -XX:+PrintGCDetails and collect details, and also look into java 7 G1. thanks
> 
> 
> 
> 
> 
> 
> On Mon, Mar 10, 2014 at 6:27 PM, Matei Zaharia <ma...@gmail.com> wrote:
> Does this happen repeatedly if you keep running the computation, or just the first time? It may take time to move these Java objects to the old generation the first time you run queries, which could lead to a GC pause that also slows down the small queries.
> 
> If you can run with -XX:+PrintGCDetails in your Java options, it would also be good to see what percent of each GC generation is used.
> 
> The concurrent mark-and-sweep GC -XX:+UseConcMarkSweepGC or the G1 GC in Java 7 (-XX:+UseG1GC) might also avoid these pauses by GCing concurrently with your application threads.
> 
> Matei
> 
> On Mar 10, 2014, at 3:18 PM, Koert Kuipers <ko...@tresata.com> wrote:
> 
>> hello all,
>> i am observing a strange result. i have a computation that i run on a cached RDD in spark-standalone. it typically takes about 4 seconds. 
>> 
>> but when other RDDs that are not relevant to the computation at hand are cached in memory (in same spark context), the computation takes 40 seconds or more.
>> 
>> the problem seems to be GC time, which goes from milliseconds to tens of seconds.
>> 
>> note that my issue is not that memory is full. i have cached about 14G in RDDs with 66G available across workers for the application. also my computation did not push any cached RDD out of memory.
>> 
>> any ideas?
> 
> 
> 


Re: computation slows down 10x because of cached RDDs

Posted by Koert Kuipers <ko...@tresata.com>.
hey matei,
most tasks have GC times of 200ms or less, and then a few tasks take many
seconds. example GC activity for a slow one:

[GC [PSYoungGen: 1051814K->262624K(1398144K)] 3789259K->3524429K(5592448K),
0.0986800 secs] [Times: user=1.53 sys=0.01, real=0.10 secs]
[GC [PSYoungGen: 786935K->524512K(1398144K)] 4048741K->4048762K(5592448K),
0.1132490 secs] [Times: user=1.70 sys=0.01, real=0.11 secs]
[Full GC [PSYoungGen: 524512K->0K(1398144K)] [PSOldGen:
3524250K->2207344K(4194304K)] 4048762K->2207344K(5592448K) [PSPermGen:
56545K->54639K(83968K)], 7.7059350 secs] [Times:\
 user=7.71 sys=0.00, real=7.70 secs]


so looks like i am hit by stop-the-world gc?


On Mon, Mar 10, 2014 at 7:00 PM, Koert Kuipers <ko...@tresata.com> wrote:

> hey matei,
> it happens repeatedly.
>
> we are currently runnning on java 6 with spark 0.9.
>
> i will add -XX:+PrintGCDetails and collect details, and also look into
> java 7 G1. thanks
>
>
>
>
>
>
> On Mon, Mar 10, 2014 at 6:27 PM, Matei Zaharia <ma...@gmail.com>wrote:
>
>> Does this happen repeatedly if you keep running the computation, or just
>> the first time? It may take time to move these Java objects to the old
>> generation the first time you run queries, which could lead to a GC pause
>> that also slows down the small queries.
>>
>> If you can run with -XX:+PrintGCDetails in your Java options, it would
>> also be good to see what percent of each GC generation is used.
>>
>> The concurrent mark-and-sweep GC -XX:+UseConcMarkSweepGC or the G1 GC in
>> Java 7 (-XX:+UseG1GC) might also avoid these pauses by GCing concurrently
>> with your application threads.
>>
>> Matei
>>
>> On Mar 10, 2014, at 3:18 PM, Koert Kuipers <ko...@tresata.com> wrote:
>>
>> hello all,
>> i am observing a strange result. i have a computation that i run on a
>> cached RDD in spark-standalone. it typically takes about 4 seconds.
>>
>> but when other RDDs that are not relevant to the computation at hand are
>> cached in memory (in same spark context), the computation takes 40 seconds
>> or more.
>>
>> the problem seems to be GC time, which goes from milliseconds to tens of
>> seconds.
>>
>> note that my issue is not that memory is full. i have cached about 14G in
>> RDDs with 66G available across workers for the application. also my
>> computation did not push any cached RDD out of memory.
>>
>> any ideas?
>>
>>
>>
>

Re: computation slows down 10x because of cached RDDs

Posted by Koert Kuipers <ko...@tresata.com>.
hey matei,
it happens repeatedly.

we are currently runnning on java 6 with spark 0.9.

i will add -XX:+PrintGCDetails and collect details, and also look into java
7 G1. thanks






On Mon, Mar 10, 2014 at 6:27 PM, Matei Zaharia <ma...@gmail.com>wrote:

> Does this happen repeatedly if you keep running the computation, or just
> the first time? It may take time to move these Java objects to the old
> generation the first time you run queries, which could lead to a GC pause
> that also slows down the small queries.
>
> If you can run with -XX:+PrintGCDetails in your Java options, it would
> also be good to see what percent of each GC generation is used.
>
> The concurrent mark-and-sweep GC -XX:+UseConcMarkSweepGC or the G1 GC in
> Java 7 (-XX:+UseG1GC) might also avoid these pauses by GCing concurrently
> with your application threads.
>
> Matei
>
> On Mar 10, 2014, at 3:18 PM, Koert Kuipers <ko...@tresata.com> wrote:
>
> hello all,
> i am observing a strange result. i have a computation that i run on a
> cached RDD in spark-standalone. it typically takes about 4 seconds.
>
> but when other RDDs that are not relevant to the computation at hand are
> cached in memory (in same spark context), the computation takes 40 seconds
> or more.
>
> the problem seems to be GC time, which goes from milliseconds to tens of
> seconds.
>
> note that my issue is not that memory is full. i have cached about 14G in
> RDDs with 66G available across workers for the application. also my
> computation did not push any cached RDD out of memory.
>
> any ideas?
>
>
>

Re: computation slows down 10x because of cached RDDs

Posted by Matei Zaharia <ma...@gmail.com>.
Does this happen repeatedly if you keep running the computation, or just the first time? It may take time to move these Java objects to the old generation the first time you run queries, which could lead to a GC pause that also slows down the small queries.

If you can run with -XX:+PrintGCDetails in your Java options, it would also be good to see what percent of each GC generation is used.

The concurrent mark-and-sweep GC -XX:+UseConcMarkSweepGC or the G1 GC in Java 7 (-XX:+UseG1GC) might also avoid these pauses by GCing concurrently with your application threads.

Matei

On Mar 10, 2014, at 3:18 PM, Koert Kuipers <ko...@tresata.com> wrote:

> hello all,
> i am observing a strange result. i have a computation that i run on a cached RDD in spark-standalone. it typically takes about 4 seconds. 
> 
> but when other RDDs that are not relevant to the computation at hand are cached in memory (in same spark context), the computation takes 40 seconds or more.
> 
> the problem seems to be GC time, which goes from milliseconds to tens of seconds.
> 
> note that my issue is not that memory is full. i have cached about 14G in RDDs with 66G available across workers for the application. also my computation did not push any cached RDD out of memory.
> 
> any ideas?