You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@ignite.apache.org by "Ch. Vishal Ratnam" <vi...@gmail.com> on 2022/01/28 11:22:24 UTC

System critical exception and slowing down of processing speed

Hi folks,

We're trying to migrate 150million records from 1 cache to another in a
2-node cluster. We need this data to have back-ups, and since an existing
cache can't be modified to have back-ups, we thought of creating a similar
cache with backups:1 and now we're trying the data to this new cache.

Using ignite version: 2.7.5

We're connecting to the nodes through Java Ignite thin client. We start a
client node on the application server and use the thin client to connect to
the client node. The client node connects to the server nodes.
We tried it this way :

Iterator<Cache.Entry<Object, Object>> iter =
ignite.cache("old_cache_name").iterator()

while(iter.hasNext()) {

    OldCache object =(OldCache) iter.next().getValue();

    # Create a map of objects
    map_object.put(object.OldCacheId(),object);

    # Collect 20000 objects for sending
    if(map_object.size()==20000) {
        IgniteCache<Object, Object> cachenew =ignite.cache("new_cache_name");
                cachenew.putAllAsync(map_object);
                map_object.clear();
        LOG.error("Finished sending to ignite a batch");
        try {
           Thread.sleep(5000);
        } catch (InterruptedException e) {
           LOG.error("Exception occured in main thread :{}",e);
        }
    }

}

We have used putAllAsync() to send the data to the cache.

Through this approach, we're able to migrate around 80million records. Post
70-80 million records, the processing becomes very slow, and data in the
new cache increases at a very slow rate. The Ignite server nodes start
printing the following errors:
" Critical system error detected. Will be handled accordingly to configured
handler" (the configured handler is NoOpFailureHandler), "Blocked system
critical thread has been detected"
[image: image.png].             [image: image.png]

The 2 Ignite servers are hosted on AWS instances with 512 GB RAM and 64
VCPUs each on AMD EPYC 7000 series processors (AWS memory-optimized
instances).
Ignite server node properties:
Default data region initialSize: 105GB
Default data region maxSize: 220GB
persistence: enabled
checkpointPageBufferSize:  8GB
writeThrottlingEnabled: true
JVM xmx: 30gb
JVM xms: 30gb

We've been trying to get this data migration right for a week by tweaking a
lot of parameters, nothing seems to be helping! Any help would be greatly
appreciated.


Thanks & Regards,
Vishal Ratnam

Re: System critical exception and slowing down of processing speed

Posted by Alex Plehanov <pl...@gmail.com>.

Hello,

Async API in Apache Ignite java thin client was introduced only in version
2.10 (there was no putAllAsync method in 2.7.5), perhaps something wrong
with your description.
Is OldCacheId() property unique? If not, there is a deadlock possible.
Also, if persistence is enabled, check server logs for "Throttling is
applied to page modifications" message.

пт, 28 янв. 2022 г. в 14:23, Ch. Vishal Ratnam <vi...@gmail.com>:

> Hi folks,
>
> We're trying to migrate 150million records from 1 cache to another in a
> 2-node cluster. We need this data to have back-ups, and since an existing
> cache can't be modified to have back-ups, we thought of creating a similar
> cache with backups:1 and now we're trying the data to this new cache.
>
> Using ignite version: 2.7.5
>
> We're connecting to the nodes through Java Ignite thin client. We start a
> client node on the application server and use the thin client to connect to
> the client node. The client node connects to the server nodes.
> We tried it this way :
>
> Iterator<Cache.Entry<Object, Object>> iter = ignite.cache("old_cache_name").iterator()
>
> while(iter.hasNext()) {
>
>     OldCache object =(OldCache) iter.next().getValue();
>
>     # Create a map of objects
>     map_object.put(object.OldCacheId(),object);
>
>     # Collect 20000 objects for sending
>     if(map_object.size()==20000) {
>         IgniteCache<Object, Object> cachenew =ignite.cache("new_cache_name");
>                 cachenew.putAllAsync(map_object);
>                 map_object.clear();
>         LOG.error("Finished sending to ignite a batch");
>         try {
>            Thread.sleep(5000);
>         } catch (InterruptedException e) {
>            LOG.error("Exception occured in main thread :{}",e);
>         }
>     }
>
> }
>
> We have used putAllAsync() to send the data to the cache.
>
> Through this approach, we're able to migrate around 80million records.
> Post 70-80 million records, the processing becomes very slow, and data in
> the new cache increases at a very slow rate. The Ignite server nodes start
> printing the following errors:
> " Critical system error detected. Will be handled accordingly to
> configured handler" (the configured handler is NoOpFailureHandler),
> "Blocked system critical thread has been detected"
> [image: image.png].             [image: image.png]
>
> The 2 Ignite servers are hosted on AWS instances with 512 GB RAM and 64
> VCPUs each on AMD EPYC 7000 series processors (AWS memory-optimized
> instances).
> Ignite server node properties:
> Default data region initialSize: 105GB
> Default data region maxSize: 220GB
> persistence: enabled
> checkpointPageBufferSize:  8GB
> writeThrottlingEnabled: true
> JVM xmx: 30gb
> JVM xms: 30gb
>
> We've been trying to get this data migration right for a week by tweaking
> a lot of parameters, nothing seems to be helping! Any help would be greatly
> appreciated.
>
>
> Thanks & Regards,
> Vishal Ratnam
>