You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by "Eric Honer (Jira)" <ji...@apache.org> on 2023/04/03 19:54:00 UTC

[jira] [Commented] (SAMZA-2778) Make AzureBlobOutputStream buffer initialization size configurable.

    [ https://issues.apache.org/jira/browse/SAMZA-2778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17708097#comment-17708097 ] 

Eric Honer commented on SAMZA-2778:
-----------------------------------

[PR#1662|https://github.com/apache/samza/pull/1662] submitted for review.

> Make AzureBlobOutputStream buffer initialization size configurable.
> -------------------------------------------------------------------
>
>                 Key: SAMZA-2778
>                 URL: https://issues.apache.org/jira/browse/SAMZA-2778
>             Project: Samza
>          Issue Type: Bug
>            Reporter: Aditya Toomula
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> The existing {{AzureBlobOutputStream}} uses a {{ByteArrayOutputStream}} to buffer messages until {{flush()}} *and* new buffers are initialized to 10MB (Azure's maximum block size). This can cause issues with the G1 garbage collector (default in Java 11) since these would be considered humongous objects. The G1 GC divides the heap into regions and considers any object larger than half of a region size to be humongous. These objects are immediately promoted to perm gen and allocated an entire region. Being allocated to an entire region prevents the GC from allocating memory to unused portions of that region. If the object is larger than a region, multiple contiguous regions are allocated. If there are large number of buffers the JVM can experience OOMs if no regions are empty when a new {{ByteArrayOutputStream}} is created. The JVM terminates because new requires immediate memory allocation and cannot not wait for GC.
> GC effectiveness can be improved if the {{ByteArrayOutputStream}} is allowed to grow as messages are added and delay or even avoid being considered humongous. These buffers can still become humongous objects, but only once the buffer grows to sufficient size. Clients can customize the initialization size to accommodate their systems.
> h3. References
>  * "[Humongous Objects and Humongous Allocations|https://www.oracle.com/technical-resources/articles/java/g1gc.html#:~:text=Humongous%20Objects%20and%20Humongous%20Allocations,generation%20into%20%22Humongous%20regions%22.&text=A%20full%20garbage%20collection%20cycle%20compacts%20Humongous%20objects%20in%20place.]"
>  * "[Part 1: Introduction to the G1 Garbage Collector|https://www.redhat.com/en/blog/part-1-introduction-g1-garbage-collector]"
>  * "[What's the deal with humonguous objects in Java?|https://devblogs.microsoft.com/java/whats-the-deal-with-humongous-objects-in-java/]"



--
This message was sent by Atlassian Jira
(v8.20.10#820010)