You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Vladimir Steshin (Jira)" <ji...@apache.org> on 2022/09/21 20:11:00 UTC

[jira] [Comment Edited] (IGNITE-17735) Datastreamer may consume whole heap.

    [ https://issues.apache.org/jira/browse/IGNITE-17735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17607891#comment-17607891 ] 

Vladimir Steshin edited comment on IGNITE-17735 at 9/21/22 8:10 PM:
--------------------------------------------------------------------

Datastreamer with Individual receiver and ATOMIC/PRIMARY_SYNC persistent cache may consume heap. The test case is simple: 2 or 3 servers, 2 or 1 backups and Datastreamer from client loading significant amount of data. Around 1G of heap. Tested with 6 (16) CPU's, 6-16 streamer threads.

See `JmhStreamerReceiverBenchmark.bchIndividual_512_1()`, `DataStreamProcessorSelfTest.testAtomicPrimarySyncStability()`.

The problem is that the streamer doesn't wait for backup updates on primary node and keep sending update batches again and again. Individual receiver uses cache.put(). Every put creates a future for primary update and future and update update request for the backups. Nodes start accumulating related  to single update objects in the heap (`processDhtAtomicUpdateRequest()`).

There is no reason to send more than 2-3-4 unresponded batches because they stuck at disk writes, WAL writes, page replacements, WAL rolling, GCs and so on. Why so many parallel batches by default? Especially for persistent caches. IgniteDataStreamer.DFLT_PARALLEL_OPS_MULTIPLIER=8 is weird to me. With 8 CPUs and 16 threads I get 128 parallel batches. 

Solution: reduce default max parallel batches for a nod. Make this value depend on the persistence.

Some JFR screens attached.


was (Author: vladsz83):
Datastreamer with Individual receiver and ATOMIC/PRIMARY_SYNC persistent cache may consume heap. The test case is simple: 2 or 3 servers, 2 or 1 backups and Datastreamer from client loading significant amount of data. Around 1G of heap. Tested with 6 (16) CPU's, 6-16 streamer threads.

See `JmhStreamerReceiverBenchmark.bchIndividual_512_1()`, `DataStreamProcessorSelfTest.testAtomicPrimarySyncStability()`.

The problem is that the streamer doesn't wait for backup updates from primary node and keep sending update batches again and again. Individual receiver uses cache.put(). Every put creates a future and update request for the backups. Nodes start accumulating related  to single update objects in the heap (`processDhtAtomicUpdateRequest()`).

There is no reason to send more than 2-3-4 unresponded batches because they stuck at disk writes, WAL writes, page replacements, WAL rolling, GCs and so on. Why so many parallel batches by default? Especially for persistent caches. IgniteDataStreamer.DFLT_PARALLEL_OPS_MULTIPLIER=8 is weird to me. With 8 CPUs and 16 threads I get 128 parallel batches. 

Solution: reduce default max parallel batches for a nod. Make this value depend on the persistence.

Some JFR screens attached.

> Datastreamer may consume whole heap.
> ------------------------------------
>
>                 Key: IGNITE-17735
>                 URL: https://issues.apache.org/jira/browse/IGNITE-17735
>             Project: Ignite
>          Issue Type: Sub-task
>            Reporter: Vladimir Steshin
>            Assignee: Vladimir Steshin
>            Priority: Major
>         Attachments: DS_heap_no_events_no_wal.png, DS_heap_no_events_no_wal_2.png
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)