You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by Artem Zinnatullin <ar...@gmail.com> on 2018/07/27 21:36:35 UTC

Ignite Server nodes crash with OutOfMemoryError on cluster upscale

Hello, dear Ignite users & developers

I'm running an Ignite 2.6.0 cluster in k8s as a partitioned cache
(off-heap) with eviction policy. It all works great and handles around 100K
RPM in peaks (many thanks for the software!).

The problem I'm facing is that on cluster topology change, all existing
Ignite server instances crash with Java OutOfMemoryError and lose data
(gladly it's cache, but still important for performance of our systems).

First time I've noticed that happened, I've increased Java heap for Ignite
server nodes to 3 GB, and it seemed to work for a while during minor
topology changes (ie add one node, remove one node)

But I just increased number of nodes to 12 from 8, so 4 new nodes, and all
already running nodes crashed with OOM again

Looks like the more nodes we're trying to add to a cluster at around the
same time, the more memory is required for existing nodes to handle this
change.

Do you have any recommendations on how much Java heap an Ignite server need
for a given cluster size? Note that actual data is stored off-heap (see
configs below).

Thanks!

Configs:

k8s Ignite server containers:

containers:
      - name: ignite-node
        image: apacheignite/ignite:2.6.0
        resources:
          requests:
            memory: "28G"
            cpu: 1
          limits:
            memory: "29G"
            cpu: 2

JVM_OPTS:
-server \
-Djava.net.preferIPv4Stack=true \
-XX:+UnlockExperimentalVMOptions \
-XX:+UseCGroupMemoryLimitForHeap \
-XX:MaxDirectMemorySize=25g \
-Xms1g \
-Xmx3g \
-XX:+UseG1GC \
-XX:+AlwaysPreTouch \
-XX:+ScavengeBeforeFullGC \
-XX:+DisableExplicitGC

XML:

<property name="dataStorageConfiguration">
    <bean class="org.apache.ignite.configuration.DataStorageConfiguration">
        <!-- Ideal page size should be about the size of an average
entry. Set to 8 KB (default is 4 KB). -->
        <property name="pageSize" value="8192"/>

        <property name="defaultDataRegionConfiguration">
            <bean
class="org.apache.ignite.configuration.DataRegionConfiguration">
                <property name="name" value="Default_Region"/>
                <!-- 24 GB initial size per node. -->
                <property name="initialSize" value="#{24L * 1024 *
1024 * 1024}"/>
                <!-- 24 GB max size per node. -->
                <property name="maxSize" value="#{24L * 1024 * 1024 * 1024}"/>
                <!-- Enable data pages eviction. -->
                <property name="pageEvictionMode" value="RANDOM_2_LRU"/>
                <!--
                    Default page size is 4 KB, default empty page pool
size is 100 which that
                    means we only have ~ 4 MB of memory ready for use.
In case of frequent writes and
                    larger data objects we can get
IgniteOutOfMemoryException even though eviction is enabled.
                    Docs recommend increasing this value in such case.
Number of available pages should be great
                    enough to fit largest entry. Set 25600 pages (each
page is set to 8 KB, see above) to
                    have 200 MB of memory ready for use.
                -->
                <property name="emptyPagesPoolSize" value="25600"/>
                <!-- Enable periodic metrics collection. -->
                <property name="metricsEnabled" value="true"/>
            </bean>
        </property>
    </bean>
</property>

<property name="cacheConfiguration">
    <bean class="org.apache.ignite.configuration.CacheConfiguration">
        <property name="name" value="my_cache"/>
        <property name="cacheMode" value="PARTITIONED"/>
        <property name="dataRegionName" value="Default_Region"/>
    </bean>
</property>



Kind regards,
Artem Zinnatullin.

Re: Ignite Server nodes crash with OutOfMemoryError on cluster upscale

Posted by Denis Mekhanikov <dm...@gmail.com>.
Artem,

Heap space, required by Ignite depends on the workload, that you perform on
the cluster.
Usually SQL is the most memory-consuming part. I can't say, what causes the
node failure in your case.
When cluster topology changes, data rebalancing is triggered. It may
pollute the heap space.

You can make Java generate heap dumps, when OOME happens:
http://www.oracle.com/technetwork/java/javase/clopts-139448.html#gbzrr
Analyzing it may give you a clue.

Denis


сб, 28 июл. 2018 г. в 0:36, Artem Zinnatullin <ar...@gmail.com>:

> Hello, dear Ignite users & developers
>
> I'm running an Ignite 2.6.0 cluster in k8s as a partitioned cache
> (off-heap) with eviction policy. It all works great and handles around 100K
> RPM in peaks (many thanks for the software!).
>
> The problem I'm facing is that on cluster topology change, all existing
> Ignite server instances crash with Java OutOfMemoryError and lose data
> (gladly it's cache, but still important for performance of our systems).
>
> First time I've noticed that happened, I've increased Java heap for Ignite
> server nodes to 3 GB, and it seemed to work for a while during minor
> topology changes (ie add one node, remove one node)
>
> But I just increased number of nodes to 12 from 8, so 4 new nodes, and all
> already running nodes crashed with OOM again
>
> Looks like the more nodes we're trying to add to a cluster at around the
> same time, the more memory is required for existing nodes to handle this
> change.
>
> Do you have any recommendations on how much Java heap an Ignite server
> need for a given cluster size? Note that actual data is stored off-heap
> (see configs below).
>
> Thanks!
>
> Configs:
>
> k8s Ignite server containers:
>
> containers:
>       - name: ignite-node
>         image: apacheignite/ignite:2.6.0
>         resources:
>           requests:
>             memory: "28G"
>             cpu: 1
>           limits:
>             memory: "29G"
>             cpu: 2
>
> JVM_OPTS:
> -server \
> -Djava.net.preferIPv4Stack=true \
> -XX:+UnlockExperimentalVMOptions \
> -XX:+UseCGroupMemoryLimitForHeap \
> -XX:MaxDirectMemorySize=25g \
> -Xms1g \
> -Xmx3g \
> -XX:+UseG1GC \
> -XX:+AlwaysPreTouch \
> -XX:+ScavengeBeforeFullGC \
> -XX:+DisableExplicitGC
>
> XML:
>
> <property name="dataStorageConfiguration">
>     <bean class="org.apache.ignite.configuration.DataStorageConfiguration">
>         <!-- Ideal page size should be about the size of an average entry. Set to 8 KB (default is 4 KB). -->
>         <property name="pageSize" value="8192"/>
>
>         <property name="defaultDataRegionConfiguration">
>             <bean class="org.apache.ignite.configuration.DataRegionConfiguration">
>                 <property name="name" value="Default_Region"/>
>                 <!-- 24 GB initial size per node. -->
>                 <property name="initialSize" value="#{24L * 1024 * 1024 * 1024}"/>
>                 <!-- 24 GB max size per node. -->
>                 <property name="maxSize" value="#{24L * 1024 * 1024 * 1024}"/>
>                 <!-- Enable data pages eviction. -->
>                 <property name="pageEvictionMode" value="RANDOM_2_LRU"/>
>                 <!--
>                     Default page size is 4 KB, default empty page pool size is 100 which that
>                     means we only have ~ 4 MB of memory ready for use. In case of frequent writes and
>                     larger data objects we can get IgniteOutOfMemoryException even though eviction is enabled.
>                     Docs recommend increasing this value in such case. Number of available pages should be great
>                     enough to fit largest entry. Set 25600 pages (each page is set to 8 KB, see above) to
>                     have 200 MB of memory ready for use.
>                 -->
>                 <property name="emptyPagesPoolSize" value="25600"/>
>                 <!-- Enable periodic metrics collection. -->
>                 <property name="metricsEnabled" value="true"/>
>             </bean>
>         </property>
>     </bean>
> </property>
>
> <property name="cacheConfiguration">
>     <bean class="org.apache.ignite.configuration.CacheConfiguration">
>         <property name="name" value="my_cache"/>
>         <property name="cacheMode" value="PARTITIONED"/>
>         <property name="dataRegionName" value="Default_Region"/>
>     </bean>
> </property>
>
>
>
> Kind regards,
> Artem Zinnatullin.
>