You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@ignite.apache.org by Ray <ra...@cisco.com> on 2018/07/24 07:30:27 UTC

Ignite node failed for no obvious reason

I have a three node Ignite 2.6 cluster setup with the following config.

    <bean id="grid.cfg"
class="org.apache.ignite.configuration.IgniteConfiguration">
        <property name="segmentationPolicy" value="RESTART_JVM"/>
        <property name="peerClassLoadingEnabled" value="true"/>
        <property name="failureDetectionTimeout" value="60000"/>
        <property name="dataStorageConfiguration">
            <bean
class="org.apache.ignite.configuration.DataStorageConfiguration">
            <property name="storagePath" value="/data/ignite/persistence"/>
            <property name="walPath" value="/wal"/>
            <property name="walArchivePath" value="/wal/archive"/>
            <property name="defaultDataRegionConfiguration">
                <bean
class="org.apache.ignite.configuration.DataRegionConfiguration">
                    <property name="name" value="default_Region"/>
                    <property name="initialSize" value="#{100L * 1024 * 1024
* 1024}"/>
                    <property name="maxSize" value="#{460L * 1024 * 1024 *
1024}"/>
                    <property name="persistenceEnabled" value="true"/>
                    <property name="checkpointPageBufferSize" value="#{8L *
1024 * 1024 * 1024}"/>
                </bean>
            </property>
            <property name="walMode" value="BACKGROUND"/>
            <property name="walFlushFrequency" value="5000"/>
            <property name="checkpointFrequency" value="600000"/>
            </bean>
        </property>
        <property name="discoverySpi">
                <bean
class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
                    <property name="localPort" value="49500"/>
                    <property name="ipFinder">
                        <bean
class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">
                            <property name="addresses">
                                <list>
                                <value>node1:49500</value>
                                <value>node2:49500</value>
                                <value>node3:49500</value>
                                </list>
                            </property>
                        </bean>
                    </property>
                </bean>
            </property>
            <property name="gridLogger">
            <bean class="org.apache.ignite.logger.log4j2.Log4J2Logger">
                <constructor-arg type="java.lang.String"
value="config/ignite-log4j2.xml"/>
            </bean>
        </property>
    </bean>
</beans>

And I used this command to start Ignite service on three nodes.

./ignite.sh -J-Xmx32000m -J-Xms32000m -J-XX:+UseG1GC
-J-XX:+ScavengeBeforeFullGC -J-XX:+DisableExplicitGC -J-XX:+AlwaysPreTouch
-J-XX:+PrintGCDetails -J-XX:+PrintGCTimeStamps -J-XX:+PrintGCDateStamps
-J-XX:+PrintAdaptiveSizePolicy -XX:+PrintGCApplicationStoppedTime
-XX:+PrintGCApplicationConcurrentTime
-J-Xloggc:/spare/ignite/log/ignitegc-$(date +%Y_%m_%d-%H_%M).log 
config/persistent-config.xml

When I'm using Spark dataframe API to ingest data into this cluster, node2
failed.
The logic for this Spark job is rather simple, it just extract data from
hdfs then ingest into Ignite day by day.
Every Spark job represents one day's data, it usually take less then 10
minutes to complete.
From the following Spark job web management page, we can see that the Job Id
282 takes more than 8 hours.
spark.png
<http://apache-ignite-users.70518.x6.nabble.com/file/t1346/spark.png>  

So I suspect Ignite cluster is malfunctioning around 2018/07/23 16:25:52.

 
From line 8619 in ignite-f7266ac7-1-2018-07-28.log at 2018-07-23T16:26:04,
it looks like there're some kind of network issue between node2 and Spark
client because there's lots of log saying "No route to host" etc.
node2-log.zip
<http://apache-ignite-users.70518.x6.nabble.com/file/t1346/node2-log.zip>  

Node2 at that time is still connected in the cluster, but after some time at
2018-07-24T00:50:14,941, node2 printed last metrics log and failed.
I can't see any obvious reason why node2 failed because node2 only prints
metric log in ignite-f7266ac7.log and the GC time is normal.

So I also check log in node1 and node3 around the time node2 failed, the log
snippet for node3 is
  14213 [2018-07-24T00:51:16,622][INFO
][tcp-disco-sock-reader-#4][TcpDiscoverySpi] Finished serving remote node
connection [rmtAddr=/10.252.10.4        :59144, rmtPort=59144
  14214 [2018-07-24T00:51:16,623][INFO ][tcp-disco-srvr-#2][TcpDiscoverySpi]
TCP discovery accepted incoming connection [rmtAddr=/10.252.4.60, rm       
tPort=23056]
  14215 [2018-07-24T00:51:16,623][INFO ][tcp-disco-srvr-#2][TcpDiscoverySpi]
TCP discovery spawning a new thread for connection [rmtAddr=/10.252.       
4.60, rmtPort=23056]
  14216 [2018-07-24T00:51:16,624][INFO
][tcp-disco-sock-reader-#44][TcpDiscoverySpi] Started serving remote node
connection [rmtAddr=/10.252.4.60        :23056, rmtPort=23056]
  14217 [2018-07-24T00:51:16,626][INFO ][tcp-disco-srvr-#2][TcpDiscoverySpi]
TCP discovery accepted incoming connection [rmtAddr=/10.29.42.45, rm       
tPort=45605]
  14218 [2018-07-24T00:51:16,626][INFO ][tcp-disco-srvr-#2][TcpDiscoverySpi]
TCP discovery spawning a new thread for connection [rmtAddr=/10.29.4       
2.45, rmtPort=45605]
  14219 [2018-07-24T00:51:16,626][INFO
][tcp-disco-sock-reader-#45][TcpDiscoverySpi] Started serving remote node
connection [rmtAddr=/10.29.42.45        :45605, rmtPort=45605]
  14220 [2018-07-24T00:51:16,630][INFO ][tcp-disco-srvr-#2][TcpDiscoverySpi]
TCP discovery accepted incoming connection [rmtAddr=/10.29.42.48, rm       
tPort=57089]
  14221 [2018-07-24T00:51:16,630][INFO ][tcp-disco-srvr-#2][TcpDiscoverySpi]
TCP discovery spawning a new thread for connection [rmtAddr=/10.29.4       
2.48, rmtPort=57089]
  14222 [2018-07-24T00:51:16,630][INFO
][tcp-disco-sock-reader-#47][TcpDiscoverySpi] Started serving remote node
connection [rmtAddr=/10.29.42.48        :57089, rmtPort=57089]
  14223 [2018-07-24T00:51:16,666][WARN
][disco-event-worker-#161][GridDiscoveryManager] Node FAILED:
TcpDiscoveryNode [id=f7266ac7-409a-454f-b601        -b077b15594b3,
addrs=[10.252.10.4, 127.0.0.1],
sockAddrs=[rpsj1ign002.webex.com/10.252.10.4:49500, /127.0.0.1:49500],
discPort=49500, ord        er=2, intOrder=2, lastExchangeTime=1532323119618,
loc=false, ver=2.6.0#20180710-sha1:669feacc, isClient=false]
  14224 [2018-07-24T00:51:16,667][INFO
][disco-event-worker-#161][GridDiscoveryManager] Topology snapshot [ver=90,
servers=2, clients=8, CPUs=392        , offheap=920.0GB, heap=170.0GB]
  14225 [2018-07-24T00:51:16,667][INFO
][disco-event-worker-#161][GridDiscoveryManager]   ^-- Node
[id=BF7076FB-7C87-4AEE-AC8B-D3DB0AC8A9E2, clus        terState=ACTIVE]
  14226 [2018-07-24T00:51:16,667][INFO
][disco-event-worker-#161][GridDiscoveryManager]   ^-- Baseline [id=0,
size=3, online=2, offline=1]

and log snippet for node1 is
[2018-07-24T00:51:16,617][WARN ][tcp-disco-msg-worker-#3][TcpDiscoverySpi]
Failed to send message to next node [msg=TcpDiscoveryMetricsUpdateMessage
[super=TcpDiscoveryAbstractMessage [sndNodeId=null,
id=bf4e1a5c461-0712d9d6-49a3-42df-a8ee-5512764cb1a9,
verifierNodeId=0712d9d6-49a3-42df-a8ee-5512764cb1a9, topVer=0, pendingIdx=0,
failedNodes=null, isClient=false]], next=TcpDiscoveryNode
[id=f7266ac7-409a-454f-b601-b077b15594b3, addrs=[10.252.10.4, 127.0.0.1],
sockAddrs=[rpsj1ign002.webex.com/10.252.10.4:49500, /127.0.0.1:49500],
discPort=49500, order=2, intOrder=2, lastExchangeTime=1532323112326,
loc=false, ver=2.6.0#20180710-sha1:669feacc, isClient=false], errMsg=Failed
to send message to next node [msg=TcpDiscoveryMetricsUpdateMessage
[super=TcpDiscoveryAbstractMessage [sndNodeId=null,
id=bf4e1a5c461-0712d9d6-49a3-42df-a8ee-5512764cb1a9,
verifierNodeId=0712d9d6-49a3-42df-a8ee-5512764cb1a9, topVer=0, pendingIdx=0,
failedNodes=null, isClient=false]], next=ClusterNode
[id=f7266ac7-409a-454f-b601-b077b15594b3, order=2, addr=[10.252.10.4,
127.0.0.1], daemon=false]]]
[2018-07-24T00:51:16,627][INFO ][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP
discovery accepted incoming connection [rmtAddr=/10.29.42.46, rmtPort=58968]
[2018-07-24T00:51:16,627][INFO ][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP
discovery spawning a new thread for connection [rmtAddr=/10.29.42.46,
rmtPort=58968]
[2018-07-24T00:51:16,627][INFO ][tcp-disco-sock-reader-#52][TcpDiscoverySpi]
Started serving remote node connection [rmtAddr=/10.29.42.46:58968,
rmtPort=58968]
[2018-07-24T00:51:16,647][WARN ][tcp-disco-msg-worker-#3][TcpDiscoverySpi]
Local node has detected failed nodes and started cluster-wide procedure. To
speed up failure detection please see 'Failure Detection' section under
javadoc for 'TcpDiscoverySpi'
[2018-07-24T00:51:16,648][WARN
][disco-event-worker-#161][GridDiscoveryManager] Node FAILED:
TcpDiscoveryNode [id=f7266ac7-409a-454f-b601-b077b15594b3,
addrs=[10.252.10.4, 127.0.0.1],
sockAddrs=[rpsj1ign002.webex.com/10.252.10.4:49500, /127.0.0.1:49500],
discPort=49500, order=2, intOrder=2, lastExchangeTime=1532323112326,
loc=false, ver=2.6.0#20180710-sha1:669feacc, isClient=false]
[2018-07-24T00:51:16,649][INFO
][disco-event-worker-#161][GridDiscoveryManager] Topology snapshot [ver=90,
servers=2, clients=8, CPUs=392, offheap=920.0GB, heap=170.0GB]
[2018-07-24T00:51:16,649][INFO
][disco-event-worker-#161][GridDiscoveryManager]   ^-- Node
[id=0712D9D6-49A3-42DF-A8EE-5512764CB1A9, clusterState=ACTIVE]
[2018-07-24T00:51:16,649][INFO
][disco-event-worker-#161][GridDiscoveryManager]   ^-- Baseline [id=0,
size=3, online=2, offline=1]

I've attached the full log for all three nodes and gc log for node2 in the
attachment.
ignite-node1.log
<http://apache-ignite-users.70518.x6.nabble.com/file/t1346/ignite-node1.log>  
ignite-node3.zip
<http://apache-ignite-users.70518.x6.nabble.com/file/t1346/ignite-node3.zip>  

I also noticed there's no gc activity on node2 between
2018-07-23T19:28:03.401 to 018-07-24T00:51:51.
Can anybody help me on why node2 failed?








--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Ignite node failed for no obvious reason

Posted by Ray <ra...@cisco.com>.

Hi,

All nodes are all inter-connectable.
I think it's a short time outage in network because earlier data ingesting
is successful until  the network glitch.

I'll try to build with IGNITE-8739 and try again.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Ignite node failed for no obvious reason

Posted by Evgenii Zhuravlev <e....@gmail.com>.

Hi,

Are you sure that you have the full connectivity between all nodes? I mean
server and client nodes. There are a lot of different exceptions like
"err=No route to host" in the log, which definitely points to the network
problems.

At the same time, if you think that it was just a short-time outage and it
was restored later, then, it's possible that this fix could help you:
https://issues.apache.org/jira/browse/IGNITE-8739

Regards,
Evgenii

2018-07-24 10:30 GMT+03:00 Ray <ra...@cisco.com>:

> I have a three node Ignite 2.6 cluster setup with the following config.
>
>     <bean id="grid.cfg"
> class="org.apache.ignite.configuration.IgniteConfiguration">
>         <property name="segmentationPolicy" value="RESTART_JVM"/>
>         <property name="peerClassLoadingEnabled" value="true"/>
>         <property name="failureDetectionTimeout" value="60000"/>
>         <property name="dataStorageConfiguration">
>             <bean
> class="org.apache.ignite.configuration.DataStorageConfiguration">
>             <property name="storagePath" value="/data/ignite/
> persistence"/>
>             <property name="walPath" value="/wal"/>
>             <property name="walArchivePath" value="/wal/archive"/>
>             <property name="defaultDataRegionConfiguration">
>                 <bean
> class="org.apache.ignite.configuration.DataRegionConfiguration">
>                     <property name="name" value="default_Region"/>
>                     <property name="initialSize" value="#{100L * 1024 *
> 1024
> * 1024}"/>
>                     <property name="maxSize" value="#{460L * 1024 * 1024 *
> 1024}"/>
>                     <property name="persistenceEnabled" value="true"/>
>                     <property name="checkpointPageBufferSize" value="#{8L
> *
> 1024 * 1024 * 1024}"/>
>                 </bean>
>             </property>
>             <property name="walMode" value="BACKGROUND"/>
>             <property name="walFlushFrequency" value="5000"/>
>             <property name="checkpointFrequency" value="600000"/>
>             </bean>
>         </property>
>         <property name="discoverySpi">
>                 <bean
> class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
>                     <property name="localPort" value="49500"/>
>                     <property name="ipFinder">
>                         <bean
> class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.
> TcpDiscoveryVmIpFinder">
>                             <property name="addresses">
>                                 <list>
>                                 <value>node1:49500</value>
>                                 <value>node2:49500</value>
>                                 <value>node3:49500</value>
>                                 </list>
>                             </property>
>                         </bean>
>                     </property>
>                 </bean>
>             </property>
>             <property name="gridLogger">
>             <bean class="org.apache.ignite.logger.log4j2.Log4J2Logger">
>                 <constructor-arg type="java.lang.String"
> value="config/ignite-log4j2.xml"/>
>             </bean>
>         </property>
>     </bean>
> </beans>
>
> And I used this command to start Ignite service on three nodes.
>
> ./ignite.sh -J-Xmx32000m -J-Xms32000m -J-XX:+UseG1GC
> -J-XX:+ScavengeBeforeFullGC -J-XX:+DisableExplicitGC -J-XX:+AlwaysPreTouch
> -J-XX:+PrintGCDetails -J-XX:+PrintGCTimeStamps -J-XX:+PrintGCDateStamps
> -J-XX:+PrintAdaptiveSizePolicy -XX:+PrintGCApplicationStoppedTime
> -XX:+PrintGCApplicationConcurrentTime
> -J-Xloggc:/spare/ignite/log/ignitegc-$(date +%Y_%m_%d-%H_%M).log
> config/persistent-config.xml
>
> When I'm using Spark dataframe API to ingest data into this cluster, node2
> failed.
> The logic for this Spark job is rather simple, it just extract data from
> hdfs then ingest into Ignite day by day.
> Every Spark job represents one day's data, it usually take less then 10
> minutes to complete.
> From the following Spark job web management page, we can see that the Job
> Id
> 282 takes more than 8 hours.
> spark.png
> <http://apache-ignite-users.70518.x6.nabble.com/file/t1346/spark.png>
>
> So I suspect Ignite cluster is malfunctioning around 2018/07/23 16:25:52.
>
>
> From line 8619 in ignite-f7266ac7-1-2018-07-28.log at 2018-07-23T16:26:04,
> it looks like there're some kind of network issue between node2 and Spark
> client because there's lots of log saying "No route to host" etc.
> node2-log.zip
> <http://apache-ignite-users.70518.x6.nabble.com/file/t1346/node2-log.zip>
>
>
> Node2 at that time is still connected in the cluster, but after some time
> at
> 2018-07-24T00:50:14,941, node2 printed last metrics log and failed.
> I can't see any obvious reason why node2 failed because node2 only prints
> metric log in ignite-f7266ac7.log and the GC time is normal.
>
> So I also check log in node1 and node3 around the time node2 failed, the
> log
> snippet for node3 is
>   14213 [2018-07-24T00:51:16,622][INFO
> ][tcp-disco-sock-reader-#4][TcpDiscoverySpi] Finished serving remote node
> connection [rmtAddr=/10.252.10.4        :59144, rmtPort=59144
>   14214 [2018-07-24T00:51:16,623][INFO ][tcp-disco-srvr-#2][
> TcpDiscoverySpi]
> TCP discovery accepted incoming connection [rmtAddr=/10.252.4.60, rm
>
> tPort=23056]
>   14215 [2018-07-24T00:51:16,623][INFO ][tcp-disco-srvr-#2][
> TcpDiscoverySpi]
> TCP discovery spawning a new thread for connection [rmtAddr=/10.252.
> 4.60, rmtPort=23056]
>   14216 [2018-07-24T00:51:16,624][INFO
> ][tcp-disco-sock-reader-#44][TcpDiscoverySpi] Started serving remote node
> connection [rmtAddr=/10.252.4.60        :23056, rmtPort=23056]
>   14217 [2018-07-24T00:51:16,626][INFO ][tcp-disco-srvr-#2][
> TcpDiscoverySpi]
> TCP discovery accepted incoming connection [rmtAddr=/10.29.42.45, rm
>
> tPort=45605]
>   14218 [2018-07-24T00:51:16,626][INFO ][tcp-disco-srvr-#2][
> TcpDiscoverySpi]
> TCP discovery spawning a new thread for connection [rmtAddr=/10.29.4
> 2.45, rmtPort=45605]
>   14219 [2018-07-24T00:51:16,626][INFO
> ][tcp-disco-sock-reader-#45][TcpDiscoverySpi] Started serving remote node
> connection [rmtAddr=/10.29.42.45        :45605, rmtPort=45605]
>   14220 [2018-07-24T00:51:16,630][INFO ][tcp-disco-srvr-#2][
> TcpDiscoverySpi]
> TCP discovery accepted incoming connection [rmtAddr=/10.29.42.48, rm
>
> tPort=57089]
>   14221 [2018-07-24T00:51:16,630][INFO ][tcp-disco-srvr-#2][
> TcpDiscoverySpi]
> TCP discovery spawning a new thread for connection [rmtAddr=/10.29.4
> 2.48, rmtPort=57089]
>   14222 [2018-07-24T00:51:16,630][INFO
> ][tcp-disco-sock-reader-#47][TcpDiscoverySpi] Started serving remote node
> connection [rmtAddr=/10.29.42.48        :57089, rmtPort=57089]
>   14223 [2018-07-24T00:51:16,666][WARN
> ][disco-event-worker-#161][GridDiscoveryManager] Node FAILED:
> TcpDiscoveryNode [id=f7266ac7-409a-454f-b601        -b077b15594b3,
> addrs=[10.252.10.4, 127.0.0.1],
> sockAddrs=[rpsj1ign002.webex.com/10.252.10.4:49500, /127.0.0.1:49500],
> discPort=49500, ord        er=2, intOrder=2, lastExchangeTime=
> 1532323119618,
> loc=false, ver=2.6.0#20180710-sha1:669feacc, isClient=false]
>   14224 [2018-07-24T00:51:16,667][INFO
> ][disco-event-worker-#161][GridDiscoveryManager] Topology snapshot
> [ver=90,
> servers=2, clients=8, CPUs=392        , offheap=920.0GB, heap=170.0GB]
>   14225 [2018-07-24T00:51:16,667][INFO
> ][disco-event-worker-#161][GridDiscoveryManager]   ^-- Node
> [id=BF7076FB-7C87-4AEE-AC8B-D3DB0AC8A9E2, clus        terState=ACTIVE]
>   14226 [2018-07-24T00:51:16,667][INFO
> ][disco-event-worker-#161][GridDiscoveryManager]   ^-- Baseline [id=0,
> size=3, online=2, offline=1]
>
> and log snippet for node1 is
> [2018-07-24T00:51:16,617][WARN ][tcp-disco-msg-worker-#3][TcpDiscoverySpi]
> Failed to send message to next node [msg=TcpDiscoveryMetricsUpdateMessage
> [super=TcpDiscoveryAbstractMessage [sndNodeId=null,
> id=bf4e1a5c461-0712d9d6-49a3-42df-a8ee-5512764cb1a9,
> verifierNodeId=0712d9d6-49a3-42df-a8ee-5512764cb1a9, topVer=0,
> pendingIdx=0,
> failedNodes=null, isClient=false]], next=TcpDiscoveryNode
> [id=f7266ac7-409a-454f-b601-b077b15594b3, addrs=[10.252.10.4, 127.0.0.1],
> sockAddrs=[rpsj1ign002.webex.com/10.252.10.4:49500, /127.0.0.1:49500],
> discPort=49500, order=2, intOrder=2, lastExchangeTime=1532323112326,
> loc=false, ver=2.6.0#20180710-sha1:669feacc, isClient=false],
> errMsg=Failed
> to send message to next node [msg=TcpDiscoveryMetricsUpdateMessage
> [super=TcpDiscoveryAbstractMessage [sndNodeId=null,
> id=bf4e1a5c461-0712d9d6-49a3-42df-a8ee-5512764cb1a9,
> verifierNodeId=0712d9d6-49a3-42df-a8ee-5512764cb1a9, topVer=0,
> pendingIdx=0,
> failedNodes=null, isClient=false]], next=ClusterNode
> [id=f7266ac7-409a-454f-b601-b077b15594b3, order=2, addr=[10.252.10.4,
> 127.0.0.1], daemon=false]]]
> [2018-07-24T00:51:16,627][INFO ][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP
> discovery accepted incoming connection [rmtAddr=/10.29.42.46,
> rmtPort=58968]
> [2018-07-24T00:51:16,627][INFO ][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP
> discovery spawning a new thread for connection [rmtAddr=/10.29.42.46,
> rmtPort=58968]
> [2018-07-24T00:51:16,627][INFO ][tcp-disco-sock-reader-#52][
> TcpDiscoverySpi]
> Started serving remote node connection [rmtAddr=/10.29.42.46:58968,
> rmtPort=58968]
> [2018-07-24T00:51:16,647][WARN ][tcp-disco-msg-worker-#3][TcpDiscoverySpi]
> Local node has detected failed nodes and started cluster-wide procedure. To
> speed up failure detection please see 'Failure Detection' section under
> javadoc for 'TcpDiscoverySpi'
> [2018-07-24T00:51:16,648][WARN
> ][disco-event-worker-#161][GridDiscoveryManager] Node FAILED:
> TcpDiscoveryNode [id=f7266ac7-409a-454f-b601-b077b15594b3,
> addrs=[10.252.10.4, 127.0.0.1],
> sockAddrs=[rpsj1ign002.webex.com/10.252.10.4:49500, /127.0.0.1:49500],
> discPort=49500, order=2, intOrder=2, lastExchangeTime=1532323112326,
> loc=false, ver=2.6.0#20180710-sha1:669feacc, isClient=false]
> [2018-07-24T00:51:16,649][INFO
> ][disco-event-worker-#161][GridDiscoveryManager] Topology snapshot
> [ver=90,
> servers=2, clients=8, CPUs=392, offheap=920.0GB, heap=170.0GB]
> [2018-07-24T00:51:16,649][INFO
> ][disco-event-worker-#161][GridDiscoveryManager]   ^-- Node
> [id=0712D9D6-49A3-42DF-A8EE-5512764CB1A9, clusterState=ACTIVE]
> [2018-07-24T00:51:16,649][INFO
> ][disco-event-worker-#161][GridDiscoveryManager]   ^-- Baseline [id=0,
> size=3, online=2, offline=1]
>
> I've attached the full log for all three nodes and gc log for node2 in the
> attachment.
> ignite-node1.log
> <http://apache-ignite-users.70518.x6.nabble.com/file/
> t1346/ignite-node1.log>
> ignite-node3.zip
> <http://apache-ignite-users.70518.x6.nabble.com/file/
> t1346/ignite-node3.zip>
>
> I also noticed there's no gc activity on node2 between
> 2018-07-23T19:28:03.401 to 018-07-24T00:51:51.
> Can anybody help me on why node2 failed?
>
>
>
>
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>