You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@samza.apache.org by Xinyu Liu <xi...@linkedin.com> on 2015/12/08 03:30:22 UTC

Review Request 41071: SAMZA-843: fix heap usage increase caused by container timer change

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/41071/
-----------------------------------------------------------

Review request for samza and Yi Pan (Data Infrastructure).


Repository: samza


Description
-------

After the change of container timer metrics (chooseNs, windowNs, processNs, and commitNs) from millisecond to nanosecond, we noticed a dramatic increase of memory heap usage in one of our production job. After investigation we found that the SlidingTimeWindowReservoir.update(duration) will be called much more frequently due to the duration is non-zero after the nanosecond change (In contrast, it is often zero when using millisecond). Within the 5-minute window, the storage inside SlidingTimeWindowReservoir increases a lot for a high qps job (for our job with around 10K qps, it increases the heap from <5M to 100M). It causes long GCs and degrades the job performance. The fix will make the SlidingTimeWindowReservoir collision buffer default to be 1 and configurable from the constructor.


Diffs
-----

  samza-api/src/main/java/org/apache/samza/metrics/SlidingTimeWindowReservoir.java df543599771b7cda3be6ea702a85091cf61883d7 
  samza-api/src/main/java/org/apache/samza/metrics/Timer.java b49d14758116660ac6b26cc7e2459b293343e47e 
  samza-api/src/test/java/org/apache/samza/metrics/TestSlidingTimeWindowReservoir.java d392b3258b7aca4323d663159c4e545e113277bb 
  samza-api/src/test/java/org/apache/samza/metrics/TestTimer.java 63c183f9283c58006de23a342e61717e062030c3 
  samza-core/src/test/scala/org/apache/samza/container/TestRunLoop.scala b4d6f351f97a7c3a29c133b4ba3876ce4b6baca2 

Diff: https://reviews.apache.org/r/41071/diff/


Testing
-------


Thanks,

Xinyu Liu


Re: Review Request 41071: SAMZA-843: fix heap usage increase caused by container timer change

Posted by Boris Shkolnik <bo...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/41071/#review109423
-----------------------------------------------------------

Ship it!


We should consider more permanent fix later - may be using preallocated storage for timer, to avoid allocation and deallocation of the map values.

- Boris Shkolnik


On Dec. 8, 2015, 2:30 a.m., Xinyu Liu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/41071/
> -----------------------------------------------------------
> 
> (Updated Dec. 8, 2015, 2:30 a.m.)
> 
> 
> Review request for samza and Yi Pan (Data Infrastructure).
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> After the change of container timer metrics (chooseNs, windowNs, processNs, and commitNs) from millisecond to nanosecond, we noticed a dramatic increase of memory heap usage in one of our production job. After investigation we found that the SlidingTimeWindowReservoir.update(duration) will be called much more frequently due to the duration is non-zero after the nanosecond change (In contrast, it is often zero when using millisecond). Within the 5-minute window, the storage inside SlidingTimeWindowReservoir increases a lot for a high qps job (for our job with around 10K qps, it increases the heap from <5M to 100M). It causes long GCs and degrades the job performance. The fix will make the SlidingTimeWindowReservoir collision buffer default to be 1 and configurable from the constructor.
> 
> 
> Diffs
> -----
> 
>   samza-api/src/main/java/org/apache/samza/metrics/SlidingTimeWindowReservoir.java df543599771b7cda3be6ea702a85091cf61883d7 
>   samza-api/src/main/java/org/apache/samza/metrics/Timer.java b49d14758116660ac6b26cc7e2459b293343e47e 
>   samza-api/src/test/java/org/apache/samza/metrics/TestSlidingTimeWindowReservoir.java d392b3258b7aca4323d663159c4e545e113277bb 
>   samza-api/src/test/java/org/apache/samza/metrics/TestTimer.java 63c183f9283c58006de23a342e61717e062030c3 
>   samza-core/src/test/scala/org/apache/samza/container/TestRunLoop.scala b4d6f351f97a7c3a29c133b4ba3876ce4b6baca2 
> 
> Diff: https://reviews.apache.org/r/41071/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Xinyu Liu
> 
>


Re: Review Request 41071: SAMZA-843: fix heap usage increase caused by container timer change

Posted by "Yi Pan (Data Infrastructure)" <yi...@linkedin.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/41071/#review109387
-----------------------------------------------------------

Ship it!


Ship It!

- Yi Pan (Data Infrastructure)


On Dec. 8, 2015, 2:30 a.m., Xinyu Liu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/41071/
> -----------------------------------------------------------
> 
> (Updated Dec. 8, 2015, 2:30 a.m.)
> 
> 
> Review request for samza and Yi Pan (Data Infrastructure).
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> After the change of container timer metrics (chooseNs, windowNs, processNs, and commitNs) from millisecond to nanosecond, we noticed a dramatic increase of memory heap usage in one of our production job. After investigation we found that the SlidingTimeWindowReservoir.update(duration) will be called much more frequently due to the duration is non-zero after the nanosecond change (In contrast, it is often zero when using millisecond). Within the 5-minute window, the storage inside SlidingTimeWindowReservoir increases a lot for a high qps job (for our job with around 10K qps, it increases the heap from <5M to 100M). It causes long GCs and degrades the job performance. The fix will make the SlidingTimeWindowReservoir collision buffer default to be 1 and configurable from the constructor.
> 
> 
> Diffs
> -----
> 
>   samza-api/src/main/java/org/apache/samza/metrics/SlidingTimeWindowReservoir.java df543599771b7cda3be6ea702a85091cf61883d7 
>   samza-api/src/main/java/org/apache/samza/metrics/Timer.java b49d14758116660ac6b26cc7e2459b293343e47e 
>   samza-api/src/test/java/org/apache/samza/metrics/TestSlidingTimeWindowReservoir.java d392b3258b7aca4323d663159c4e545e113277bb 
>   samza-api/src/test/java/org/apache/samza/metrics/TestTimer.java 63c183f9283c58006de23a342e61717e062030c3 
>   samza-core/src/test/scala/org/apache/samza/container/TestRunLoop.scala b4d6f351f97a7c3a29c133b4ba3876ce4b6baca2 
> 
> Diff: https://reviews.apache.org/r/41071/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Xinyu Liu
> 
>


Re: Review Request 41071: SAMZA-843: fix heap usage increase caused by container timer change

Posted by Boris Shkolnik <bo...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/41071/#review109415
-----------------------------------------------------------



samza-api/src/main/java/org/apache/samza/metrics/SlidingTimeWindowReservoir.java (line 51)
<https://reviews.apache.org/r/41071/#comment168918>

    nit.can we rename it to something like collistionBufferFactor. Otherwise the name is somewhat confusing.



samza-api/src/main/java/org/apache/samza/metrics/SlidingTimeWindowReservoir.java (line 148)
<https://reviews.apache.org/r/41071/#comment168923>

    nit.please add comment here - this logic is difficult to understand.



samza-core/src/test/scala/org/apache/samza/container/TestRunLoop.scala (line 194)
<https://reviews.apache.org/r/41071/#comment168927>

    nit. Do you need this 'c' here or the previous like will just return it?


- Boris Shkolnik


On Dec. 8, 2015, 2:30 a.m., Xinyu Liu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/41071/
> -----------------------------------------------------------
> 
> (Updated Dec. 8, 2015, 2:30 a.m.)
> 
> 
> Review request for samza and Yi Pan (Data Infrastructure).
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> After the change of container timer metrics (chooseNs, windowNs, processNs, and commitNs) from millisecond to nanosecond, we noticed a dramatic increase of memory heap usage in one of our production job. After investigation we found that the SlidingTimeWindowReservoir.update(duration) will be called much more frequently due to the duration is non-zero after the nanosecond change (In contrast, it is often zero when using millisecond). Within the 5-minute window, the storage inside SlidingTimeWindowReservoir increases a lot for a high qps job (for our job with around 10K qps, it increases the heap from <5M to 100M). It causes long GCs and degrades the job performance. The fix will make the SlidingTimeWindowReservoir collision buffer default to be 1 and configurable from the constructor.
> 
> 
> Diffs
> -----
> 
>   samza-api/src/main/java/org/apache/samza/metrics/SlidingTimeWindowReservoir.java df543599771b7cda3be6ea702a85091cf61883d7 
>   samza-api/src/main/java/org/apache/samza/metrics/Timer.java b49d14758116660ac6b26cc7e2459b293343e47e 
>   samza-api/src/test/java/org/apache/samza/metrics/TestSlidingTimeWindowReservoir.java d392b3258b7aca4323d663159c4e545e113277bb 
>   samza-api/src/test/java/org/apache/samza/metrics/TestTimer.java 63c183f9283c58006de23a342e61717e062030c3 
>   samza-core/src/test/scala/org/apache/samza/container/TestRunLoop.scala b4d6f351f97a7c3a29c133b4ba3876ce4b6baca2 
> 
> Diff: https://reviews.apache.org/r/41071/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Xinyu Liu
> 
>