You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Aleksey Plekhanov (Jira)" <ji...@apache.org> on 2024/03/25 14:53:00 UTC

[jira] [Resolved] (IGNITE-21478) OOM crash with unstable topology

     [ https://issues.apache.org/jira/browse/IGNITE-21478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aleksey Plekhanov resolved IGNITE-21478.
----------------------------------------
    Fix Version/s: 2.17
     Release Note: Fixed OOM crash on unstable topology
       Resolution: Fixed

[~yuri.naryshkin], looks good to me. Merged to master. Thanks for the contribution!

> OOM crash with unstable topology
> --------------------------------
>
>                 Key: IGNITE-21478
>                 URL: https://issues.apache.org/jira/browse/IGNITE-21478
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Luchnikov Alexander
>            Assignee: Yuri Naryshkin
>            Priority: Minor
>              Labels: ise
>             Fix For: 2.17
>
>         Attachments: HistoMinorTop.png, histo.png
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> User cases:
> 1) Frequent entry/exit of a thick client into the topology leads to a crash of the server node due to OMM.
> 2) Frequent creation and destroy of caches leads to a server node crash due to OOM.
>  topVer=20098
> *Real case*
> Part of the log before the OOM crash, pay attention to *topVer=20098*
> {code:java}
> Metrics for local node (to disable set 'metricsLogFrequency' to 0)
>     ^-- Node [id=f080abcd, uptime=3 days, 09:00:55.274]
>     ^-- Cluster [hosts=4, CPUs=6, servers=2, clients=2, topVer=20098, minorTopVer=6]
>     ^-- Network [addrs=[192.168.1.2, 127.0.0.1], discoPort=47500, commPort=47100]
>     ^-- CPU [CPUs=2, curLoad=86.83%, avgLoad=21.9%, GC=23.9%]
>     ^-- Heap [used=867MB, free=15.29%, comm=1024MB]
>     ^-- Outbound messages queue [size=0]
>     ^-- Public thread pool [active=0, idle=7, qSize=0]
>     ^-- System thread pool [active=0, idle=8, qSize=0]
>     ^-- Striped thread pool [active=0, idle=8, qSize=0]
> {code}
> Histogram from heap-dump after node failed
>  !histo.png! 
> *MinorTop example*
> {code:java}
>     @Test
>     public void testMinorVer() throws Exception {
>         Ignite server = startGrids(1);
>         IgniteEx client = startClientGrid();
>         String cacheName = "cacheName";
>         for (int i = 0; i < 500; i++) {
>             client.getOrCreateCache(cacheName);
>             client.destroyCache(cacheName);
>         }
>         System.err.println("Heap dump time");
>         Thread.sleep(1000000);
>     }
> {code}
> {code:java}
> [INFO ][exchange-worker-#149%internal.IgniteOomTest%][GridCachePartitionExchangeManager] AffinityTopologyVersion [topVer=2, minorTopVer=1000], evt=DISCOVERY_CUSTOM_EVT, evtNode=52b4c130-1a01-4858-813a-ebc8a5dabf1e, client=true]
> {code}
>  !HistoMinorTop.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)