You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@ignite.apache.org by tysli2016 <To...@cityline.com.hk> on 2017/05/04 08:57:47 UTC

OOME on 2-node cluster with visor running repeatedly, Ignite 1.9

Got "OutOfMemoryError: Java heap space" with 2-node cluster with a `visor`
running repeatedly.

The server nodes are running on CentOS 7 inside Oracle VirtualBox VM with
the same config:
- 2 vCPUs
- 3.5GB memory
- Oracle JDK 2.8.0_121

`default-config.xml` was modified to use non-default multicast group and 1
backup:
    <beans xmlns="http://www.springframework.org/schema/beans"
           xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
           xsi:schemaLocation="
           http://www.springframework.org/schema/beans
           http://www.springframework.org/schema/beans/spring-beans.xsd">
        <bean id="grid.cfg"
class="org.apache.ignite.configuration.IgniteConfiguration">
            <property name="discoverySpi">
                <bean
class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
                    <property name="ipFinder">
                        <bean
class="org.apache.ignite.spi.discovery.tcp.ipfinder.multicast.TcpDiscoveryMulticastIpFinder">
                            <property name="multicastGroup"
value="228.10.10.158"/>
                        </bean>
                    </property>
                </bean>
            </property>
            <property name="cacheConfiguration">
                <bean
class="org.apache.ignite.configuration.CacheConfiguration">
                    <property name="backups" value="1"/>
                </bean>
            </property>
        </bean>
    </beans>


The `visor` was running repeatedly in one of the nodes by a shell script:
    #!/bin/bash
    IGNITE_HOME=/root/apache-ignite-fabric-1.9.0-bin
    while true
    do
      ${IGNITE_HOME}/bin/ignitevisorcmd.sh -e="'open
-cpath=${IGNITE_HOME}/config/default-config.xml;node'"
    done


The OOME thrown after the above settings running for 1 day.
I have put ignite log, gc log, heap dump in `dee657c8.tgz`, which could be
downloaded from 
https://drive.google.com/drive/folders/0BwY2dxDlRYhBSFJhS0ZWOVBiNk0?usp=sharing.
`507f0201.tgz` contains ignite log and gc log from another node in the
cluster, for reference just in case.

Running `visor` repeatedly is just to reproduce the OOME more quickly, in
production we run the `visor` once per 10 minutes to monitor the healthiness
of the cluster.

Questions:
1. Anything wrong with the configuration? Anything can be tuned to avoid
OOME?
2. Is there any other built-in tools allow one to monitor the cluster,
showing no. of server nodes is good enough.



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/OOME-on-2-node-cluster-with-visor-running-repeatedly-Ignite-1-9-tp12409.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: OOME on 2-node cluster with visor running repeatedly, Ignite 1.9

Posted by Andrey Novikov <an...@apache.org>.

 "nc - Total number of nodes in the grid" count server + client nodes. I
can't find metrics for server nodes.

I tried to check the heap dump in "mat" and found a large size of
TcpCommunicationSpi#recoveryDescs. Does anyone have an idea why this
happened?



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/OOME-on-2-node-cluster-with-visor-running-repeatedly-Ignite-1-9-tp12409p12446.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: OOME on 2-node cluster with visor running repeatedly, Ignite 1.9

Posted by tysli2016 <To...@cityline.com.hk>.

Thanks Andrey, is there an option to monitor the number of server nodes in
the grid?

I found "nc - Total number of nodes in the grid.", seems counting server +
client nodes, correct?



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/OOME-on-2-node-cluster-with-visor-running-repeatedly-Ignite-1-9-tp12409p12445.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: OOME on 2-node cluster with visor running repeatedly, Ignite 1.9

Posted by Andrey Novikov <an...@apache.org>.

Hi tysli2016,

You can run connected to cluster visorcmd in background and register an
alert. When this alert is triggered, it may call custom user script.

More info can be found here:
https://apacheignite-tools.readme.io/docs/alerts-configuration



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/OOME-on-2-node-cluster-with-visor-running-repeatedly-Ignite-1-9-tp12409p12417.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: OOME on 2-node cluster with visor running repeatedly, Ignite 1.9

Posted by tysli2016 <To...@cityline.com.hk>.

Thank Evgenii!

By running the `${IGNITE_HOME}/bin/ignitevisorcmd.sh -e="'open
-cpath=${IGNITE_HOME}/config/default-config.xml;node'"`, it shows "Ignite
node stopped OK" at the end. Is it an indicator of visor stopped properly?

We use the visor output to check the number of Ignite servers running, this
checking is trigger by a cron job + shell script, so it starts a new visor
each time.

How could a shell script use an already started visor?



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/OOME-on-2-node-cluster-with-visor-running-repeatedly-Ignite-1-9-tp12409p12444.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: OOME on 2-node cluster with visor running repeatedly, Ignite 1.9

Posted by Evgenii Zhuravlev <e....@gmail.com>.

Hi,

As i see, you run visor in internal mode, so, it creates a node each time.
Are you sure that you stop them properly?

Why do you need to start new visor each time? Just use already started
visor.

Evgenii




2017-05-04 11:57 GMT+03:00 tysli2016 <To...@cityline.com.hk>:

> Got "OutOfMemoryError: Java heap space" with 2-node cluster with a `visor`
> running repeatedly.
>
> The server nodes are running on CentOS 7 inside Oracle VirtualBox VM with
> the same config:
> - 2 vCPUs
> - 3.5GB memory
> - Oracle JDK 2.8.0_121
>
> `default-config.xml` was modified to use non-default multicast group and 1
> backup:
>     <beans xmlns="http://www.springframework.org/schema/beans"
>            xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>            xsi:schemaLocation="
>            http://www.springframework.org/schema/beans
>            http://www.springframework.org/schema/beans/spring-beans.xsd">
>         <bean id="grid.cfg"
> class="org.apache.ignite.configuration.IgniteConfiguration">
>             <property name="discoverySpi">
>                 <bean
> class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
>                     <property name="ipFinder">
>                         <bean
> class="org.apache.ignite.spi.discovery.tcp.ipfinder.multicast.
> TcpDiscoveryMulticastIpFinder">
>                             <property name="multicastGroup"
> value="228.10.10.158"/>
>                         </bean>
>                     </property>
>                 </bean>
>             </property>
>             <property name="cacheConfiguration">
>                 <bean
> class="org.apache.ignite.configuration.CacheConfiguration">
>                     <property name="backups" value="1"/>
>                 </bean>
>             </property>
>         </bean>
>     </beans>
>
>
> The `visor` was running repeatedly in one of the nodes by a shell script:
>     #!/bin/bash
>     IGNITE_HOME=/root/apache-ignite-fabric-1.9.0-bin
>     while true
>     do
>       ${IGNITE_HOME}/bin/ignitevisorcmd.sh -e="'open
> -cpath=${IGNITE_HOME}/config/default-config.xml;node'"
>     done
>
>
> The OOME thrown after the above settings running for 1 day.
> I have put ignite log, gc log, heap dump in `dee657c8.tgz`, which could be
> downloaded from
> https://drive.google.com/drive/folders/0BwY2dxDlRYhBSFJhS0ZWOVBiNk0?
> usp=sharing.
> `507f0201.tgz` contains ignite log and gc log from another node in the
> cluster, for reference just in case.
>
> Running `visor` repeatedly is just to reproduce the OOME more quickly, in
> production we run the `visor` once per 10 minutes to monitor the
> healthiness
> of the cluster.
>
> Questions:
> 1. Anything wrong with the configuration? Anything can be tuned to avoid
> OOME?
> 2. Is there any other built-in tools allow one to monitor the cluster,
> showing no. of server nodes is good enough.
>
>
>
> --
> View this message in context: http://apache-ignite-users.
> 70518.x6.nabble.com/OOME-on-2-node-cluster-with-visor-
> running-repeatedly-Ignite-1-9-tp12409.html
> Sent from the Apache Ignite Users mailing list archive at Nabble.com.
>