You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Jesse F Chen <jf...@us.ibm.com> on 2016/01/28 20:49:14 UTC
streaming in 1.6.0 slower than 1.5.1
I ran the same streaming application (compiled individually for 1.5.1 and
1.6.0) that processes 5-second tweet batches.
I noticed two things:
1. 10% regression in 1.6.0 vs 1.5.1
Spark v1.6.0: 1,564 tweets/s
Spark v1.5.1: 1,747 tweets/s
2. 1.6.0 streaming seems to have a memory leak.
1.6.0, processing time gradually increases and eventually exceeds 5 seconds
so batches started to queue up.
While in 1.5.1, no such slow down. See chart below to see the increasing
scheduling delay in 1.6:
I captured heap dumps in two version and did a comparison. I noticed the
Byte base class is using 50X more space in 1.5.1.
Here are some top classes in heap histogram and references.
Heap Histogram
All Classes (excluding platform)
1.6.0 Streaming 1.5.1 Streaming
Class Instance Count Total Size Class Instance Count
Total Size
class [B 8453 3,227,649,599 class [B 5095 62,938,466
class [C 44682 4,255,502 class [C 130482 12,844,182
class java.lang.reflect.Method 9059 1,177,670 class
java.lang.String 130171 1,562,052
References by Type References by Type
class [B [0x640039e38] class [B [0x6c020bb08]
Referrers by Type Referrers by Type
Class Count Class Count
java.nio.HeapByteBuffer 3239 sun.security.util.DerInputBuffer
1233
sun.security.util.DerInputBuffer 1233
sun.security.util.ObjectIdentifier 620
sun.security.util.ObjectIdentifier 620 [[B 397
[Ljava.lang.Object; 408 java.lang.reflect.Method 326
----
The total size by class B is 3GB in 1.5.1 and only 60MB in 1.6.0.
The Java.nio.HeapByteBuffer referencing class did not show up in top in
1.5.1.
I have also placed jstack output for 1.5.1 and 1.6.0 online..you can get
them here
https://ibm.box.com/sparkstreaming-jstack160
https://ibm.box.com/sparkstreaming-jstack151
Jesse
Re: streaming in 1.6.0 slower than 1.5.1
Posted by Jesse F Chen <jf...@us.ibm.com>.
Yes, Ted, thanks for the corrections! 3GB is from 1.6.0.
@ Ryan, here are operators used:
val sentiCount=sentiTweets.map(t=>(t._2._1-t._2._2))
.map(score=>if (score>posThreshold) 1
else if (score<negThreshold) -1
else 0)
.map(score=>(score,1))
.reduceByKeyAndWindow(_+_,_-_,
Seconds(ReduceWindow*60),
Seconds(BatchWindow))
Also used reduceByWindow:
val avgSenti=sentiTweets.map(_._2).reduceByWindow(
(x,y)=>(x._1+y._1,x._2+y._2),
(x,y)=>(x._1-y._1,x._2-y._2),
Seconds(ReduceWindow*60),
Seconds(BatchWindow))
.map(("sentiment",_))
Basically I am trying to analyze sentiments in incoming tweets by computing
scores on batches.
Agree that it could be memory exhaustion (not necessarily a leak), but to
the end user, their app fails in 1.6.0.
In 1.6, there is auto memory management which no longer honors the spilit
between execution memory and cache memory...
is there a way to "disable" that in 1.6 so I can test the same code to see
any difference? Perhaps an area worth looking into?
Thanks for responding so quickly!
JESSE CHEN
Big Data Performance | IBM Analytics
Office: 408 463 2296
Mobile: 408 828 9068
Email: jfchen@us.ibm.com
From: "Shixiong(Ryan) Zhu" <sh...@databricks.com>
To: Jesse F Chen/San Francisco/IBM@IBMUS
Cc: user <us...@spark.apache.org>, Ted Yu <yu...@gmail.com>
Date: 01/28/2016 12:04 PM
Subject: Re: streaming in 1.6.0 slower than 1.5.1
Hey Jesse,
Could you provide the operators you using?
For the heap dump, it may be not a real memory leak. Since batches started
to queue up, the memory usage should increase.
On Thu, Jan 28, 2016 at 11:54 AM, Ted Yu <yu...@gmail.com> wrote:
bq. The total size by class B is 3GB in 1.5.1 and only 60MB in 1.6.0.
From the information you posted, it seems the above is backwards.
BTW [B is byte[], not class B.
FYI
On Thu, Jan 28, 2016 at 11:49 AM, Jesse F Chen <jf...@us.ibm.com> wrote:
I ran the same streaming application (compiled individually for 1.5.1
and 1.6.0) that processes 5-second tweet batches.
I noticed two things:
1. 10% regression in 1.6.0 vs 1.5.1
Spark v1.6.0: 1,564 tweets/s
Spark v1.5.1: 1,747 tweets/s
2. 1.6.0 streaming seems to have a memory leak.
1.6.0, processing time gradually increases and eventually exceeds 5
seconds so batches started to queue up.
While in 1.5.1, no such slow down. See chart below to see the increasing
scheduling delay in 1.6:
I captured heap dumps in two version and did a comparison. I noticed the
Byte base class is using 50X more space in 1.5.1.
Here are some top classes in heap histogram and references.
Heap Histogram
All Classes (excluding platform)
1.6.0 Streaming 1.5.1 Streaming
Class Instance Count Total Size Class Instance Count Total Size
class [B 8453 3,227,649,599 class [B 5095 62,938,466
class [C 44682 4,255,502 class [C 130482 12,844,182
class java.lang.reflect.Method 9059 1,177,670 class java.lang.String
130171 1,562,052
References by Type References by Type
class [B [0x640039e38] class [B [0x6c020bb08]
Referrers by Type Referrers by Type
Class Count Class Count
java.nio.HeapByteBuffer 3239 sun.security.util.DerInputBuffer 1233
sun.security.util.DerInputBuffer 1233 sun.security.util.ObjectIdentifier
620
sun.security.util.ObjectIdentifier 620 [[B 397
[Ljava.lang.Object; 408 java.lang.reflect.Method 326
----
The total size by class B is 3GB in 1.5.1 and only 60MB in 1.6.0.
The Java.nio.HeapByteBuffer referencing class did not show up in top in
1.5.1.
I have also placed jstack output for 1.5.1 and 1.6.0 online..you can get
them here
https://ibm.box.com/sparkstreaming-jstack160
https://ibm.box.com/sparkstreaming-jstack151
Jesse
Re: streaming in 1.6.0 slower than 1.5.1
Posted by "Shixiong(Ryan) Zhu" <sh...@databricks.com>.
Hey Jesse,
Could you provide the operators you using?
For the heap dump, it may be not a real memory leak. Since batches started
to queue up, the memory usage should increase.
On Thu, Jan 28, 2016 at 11:54 AM, Ted Yu <yu...@gmail.com> wrote:
> bq. The total size by class B is 3GB in 1.5.1 and only 60MB in 1.6.0.
>
> From the information you posted, it seems the above is backwards.
>
> BTW [B is byte[], not class B.
>
> FYI
>
> On Thu, Jan 28, 2016 at 11:49 AM, Jesse F Chen <jf...@us.ibm.com> wrote:
>
>> I ran the same streaming application (compiled individually for 1.5.1 and
>> 1.6.0) that processes 5-second tweet batches.
>>
>> I noticed two things:
>>
>> 1. 10% regression in 1.6.0 vs 1.5.1
>>
>> Spark v1.6.0: 1,564 tweets/s
>> Spark v1.5.1: 1,747 tweets/s
>>
>> 2. 1.6.0 streaming seems to have a memory leak.
>>
>> 1.6.0, processing time gradually increases and eventually exceeds 5
>> seconds so batches started to queue up.
>> While in 1.5.1, no such slow down. See chart below to see the increasing
>> scheduling delay in 1.6:
>>
>>
>>
>> I captured heap dumps in two version and did a comparison. I noticed the
>> Byte base class is using 50X more space in 1.5.1.
>>
>> Here are some top classes in heap histogram and references.
>>
>> Heap Histogram
>>
>> All Classes (excluding platform)
>> 1.6.0 Streaming 1.5.1 Streaming
>> Class Instance Count Total Size Class Instance Count Total Size
>> class [B 8453 *3,227,649,599 * class [B 5095 62,938,466
>> class [C 44682 4,255,502 class [C 130482 12,844,182
>> class java.lang.reflect.Method 9059 1,177,670 class java.lang.String
>> 130171 1,562,052
>>
>>
>> References by Type References by Type
>>
>> class [B [0x640039e38] class [B [0x6c020bb08]
>>
>> Referrers by Type Referrers by Type
>>
>> Class Count Class Count
>> java.nio.HeapByteBuffer *3239* sun.security.util.DerInputBuffer 1233
>> sun.security.util.DerInputBuffer 1233 sun.security.util.ObjectIdentifier
>> 620
>> sun.security.util.ObjectIdentifier 620 [[B 397
>> [Ljava.lang.Object; 408 java.lang.reflect.Method 326
>>
>>
>> ----
>>
>> The total size by class B is 3GB in 1.5.1 and only 60MB in 1.6.0.
>> The Java.nio.HeapByteBuffer referencing class did not show up in top in
>> 1.5.1.
>>
>> I have also placed jstack output for 1.5.1 and 1.6.0 online..you can get
>> them here
>>
>> https://ibm.box.com/sparkstreaming-jstack160
>> https://ibm.box.com/sparkstreaming-jstack151
>>
>> Jesse
>>
>>
>>
>>
>>
>>
>>
>
Re: streaming in 1.6.0 slower than 1.5.1
Posted by Ted Yu <yu...@gmail.com>.
bq. The total size by class B is 3GB in 1.5.1 and only 60MB in 1.6.0.
>From the information you posted, it seems the above is backwards.
BTW [B is byte[], not class B.
FYI
On Thu, Jan 28, 2016 at 11:49 AM, Jesse F Chen <jf...@us.ibm.com> wrote:
> I ran the same streaming application (compiled individually for 1.5.1 and
> 1.6.0) that processes 5-second tweet batches.
>
> I noticed two things:
>
> 1. 10% regression in 1.6.0 vs 1.5.1
>
> Spark v1.6.0: 1,564 tweets/s
> Spark v1.5.1: 1,747 tweets/s
>
> 2. 1.6.0 streaming seems to have a memory leak.
>
> 1.6.0, processing time gradually increases and eventually exceeds 5
> seconds so batches started to queue up.
> While in 1.5.1, no such slow down. See chart below to see the increasing
> scheduling delay in 1.6:
>
>
>
> I captured heap dumps in two version and did a comparison. I noticed the
> Byte base class is using 50X more space in 1.5.1.
>
> Here are some top classes in heap histogram and references.
>
> Heap Histogram
>
> All Classes (excluding platform)
> 1.6.0 Streaming 1.5.1 Streaming
> Class Instance Count Total Size Class Instance Count Total Size
> class [B 8453 *3,227,649,599 * class [B 5095 62,938,466
> class [C 44682 4,255,502 class [C 130482 12,844,182
> class java.lang.reflect.Method 9059 1,177,670 class java.lang.String
> 130171 1,562,052
>
>
> References by Type References by Type
>
> class [B [0x640039e38] class [B [0x6c020bb08]
>
> Referrers by Type Referrers by Type
>
> Class Count Class Count
> java.nio.HeapByteBuffer *3239* sun.security.util.DerInputBuffer 1233
> sun.security.util.DerInputBuffer 1233 sun.security.util.ObjectIdentifier
> 620
> sun.security.util.ObjectIdentifier 620 [[B 397
> [Ljava.lang.Object; 408 java.lang.reflect.Method 326
>
>
> ----
>
> The total size by class B is 3GB in 1.5.1 and only 60MB in 1.6.0.
> The Java.nio.HeapByteBuffer referencing class did not show up in top in
> 1.5.1.
>
> I have also placed jstack output for 1.5.1 and 1.6.0 online..you can get
> them here
>
> https://ibm.box.com/sparkstreaming-jstack160
> https://ibm.box.com/sparkstreaming-jstack151
>
> Jesse
>
>
>
>
>
>
>