You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by joseheitor <jo...@heitorprojects.com> on 2018/03/20 10:14:43 UTC

[2.4.0] Cluster unrecoverable after node failure

*Scenario A: Secondary (Node-B) Failure*

Environment: 
  - 2 nodes (Node-A, Node-B)
  - Ignite native persistence enabled
  - static IP discovery - both node IPs listed
  - JDBC (Client) - DBeaver
  - manual cluster activation

Steps:
  1 - start  both nodes with no data
  2 - activate cluster on same machine as Node-A
  3 - load data via SQL JDBC (...WITH template=replicated, backups=1)
  4 - simulate power-failure ... all components down; Node-B with
unrecoverable damage
  5 - start new Node-B instance (with no data)
  6 - attempt to start Node-A (undamaged, with data)... 

/PROBLEM: Unable to start Node-A. Error follows:

[10:43:37,773][SEVERE][main][IgniteKernal] Failed to start manager:
GridManagerAdapter [enabled=true,
name=o.a.i.i.managers.discovery.GridDiscoveryManager]
class org.apache.ignite.IgniteCheckedException: Failed to start SPI:
TcpDiscoverySpi [addrRslvr=null, sockTimeout=5000, ackTimeout=5000,
marsh=JdkMarshaller
[clsFilter=org.apache.ignite.internal.IgniteKernal$5@cc6460c], reconCnt=10,
reconDelay=2000, maxAckTimeout=600000, forceSrvMode=false,
clientReconnectDisabled=false]
        at
org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:300)
        at
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.start(GridDiscoveryManager.java:892)
        at
org.apache.ignite.internal.IgniteKernal.startManager(IgniteKernal.java:1669)
        at
org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:983)
        at
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:1973)
        at
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1716)
        at
org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1144)
        at
org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1062)
        at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:948)
        at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:847)
        at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:717)
        at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:686)
        at org.apache.ignite.Ignition.start(Ignition.java:347)
        at
org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:302)
Caused by: class org.apache.ignite.spi.IgniteSpiException: Node with set up
BaselineTopology is not allowed to join cluster without one:
b5c7b617-4a34-4e3a-b119-97744f72258e
        at
org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.checkFailedError(TcpDiscoverySpi.java:1856)
        at
org.apache.ignite.spi.discovery.tcp.ServerImpl.joinTopology(ServerImpl.java:932)
        at
org.apache.ignite.spi.discovery.tcp.ServerImpl.spiStart(ServerImpl.java:364)
        at
org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.spiStart(TcpDiscoverySpi.java:1930)
        at
org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:297)
        ... 13 more
[10:43:37,778][SEVERE][main][IgniteKernal] Got exception while starting
(will rollback startup routine).
class org.apache.ignite.IgniteCheckedException: Failed to start manager:
GridManagerAdapter [enabled=true,
name=org.apache.ignite.internal.managers.discovery.GridDiscoveryManager]
        at
org.apache.ignite.internal.IgniteKernal.startManager(IgniteKernal.java:1674)
        at
org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:983)
        at
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:1973)
        at
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1716)
        at
org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1144)
        at
org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1062)
        at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:948)
        at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:847)
        at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:717)
        at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:686)
        at org.apache.ignite.Ignition.start(Ignition.java:347)
        at
org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:302)
Caused by: class org.apache.ignite.IgniteCheckedException: Failed to start
SPI: TcpDiscoverySpi [addrRslvr=null, sockTimeout=5000, ackTimeout=5000,
marsh=JdkMarshaller
[clsFilter=org.apache.ignite.internal.IgniteKernal$5@cc6460c], reconCnt=10,
reconDelay=2000, maxAckTimeout=600000, forceSrvMode=false,
clientReconnectDisabled=false]
        at
org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:300)
        at
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.start(GridDiscoveryManager.java:892)
        at
org.apache.ignite.internal.IgniteKernal.startManager(IgniteKernal.java:1669)
        ... 11 more
Caused by: class org.apache.ignite.spi.IgniteSpiException: Node with set up
BaselineTopology is not allowed to join cluster without one:
b5c7b617-4a34-4e3a-b119-97744f72258e
        at
org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.checkFailedError(TcpDiscoverySpi.java:1856)
        at
org.apache.ignite.spi.discovery.tcp.ServerImpl.joinTopology(ServerImpl.java:932)
        at
org.apache.ignite.spi.discovery.tcp.ServerImpl.spiStart(ServerImpl.java:364)
        at
org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.spiStart(TcpDiscoverySpi.java:1930)
        at
org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:297)
        ... 13 more
[10:43:42] Ignite node stopped OK [uptime=00:00:09.193]
class org.apache.ignite.IgniteException: Failed to start manager:
GridManagerAdapter [enabled=true,
name=org.apache.ignite.internal.managers.discovery.GridDiscoveryManager]
        at
org.apache.ignite.internal.util.IgniteUtils.convertException(IgniteUtils.java:980)
        at org.apache.ignite.Ignition.start(Ignition.java:350)
        at
org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:302)
Caused by: class org.apache.ignite.IgniteCheckedException: Failed to start
manager: GridManagerAdapter [enabled=true,
name=org.apache.ignite.internal.managers.discovery.GridDiscoveryManager]
        at
org.apache.ignite.internal.IgniteKernal.startManager(IgniteKernal.java:1674)
        at
org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:983)
        at
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:1973)
        at
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1716)
        at
org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1144)
        at
org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1062)
        at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:948)
        at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:847)
        at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:717)
        at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:686)
        at org.apache.ignite.Ignition.start(Ignition.java:347)
        ... 1 more
Caused by: class org.apache.ignite.IgniteCheckedException: Failed to start
SPI: TcpDiscoverySpi [addrRslvr=null, sockTimeout=5000, ackTimeout=5000,
marsh=JdkMarshaller
[clsFilter=org.apache.ignite.internal.IgniteKernal$5@cc6460c], reconCnt=10,
reconDelay=2000, maxAckTimeout=600000, forceSrvMode=false,
clientReconnectDisabled=false]
        at
org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:300)
        at
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.start(GridDiscoveryManager.java:892)
        at
org.apache.ignite.internal.IgniteKernal.startManager(IgniteKernal.java:1669)
        ... 11 more
Caused by: class org.apache.ignite.spi.IgniteSpiException: Node with set up
BaselineTopology is not allowed to join cluster without one:
b5c7b617-4a34-4e3a-b119-97744f72258e
        at
org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.checkFailedError(TcpDiscoverySpi.java:1856)
        at
org.apache.ignite.spi.discovery.tcp.ServerImpl.joinTopology(ServerImpl.java:932)
        at
org.apache.ignite.spi.discovery.tcp.ServerImpl.spiStart(ServerImpl.java:364)
        at
org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.spiStart(TcpDiscoverySpi.java:1930)
        at
org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:297)
        ... 13 more
Failed to start grid: Failed to start manager: GridManagerAdapter
[enabled=true,
name=org.apache.ignite.internal.managers.discovery.GridDiscoveryManager]
/
  7 - stop Node-B
  8 - start Node-A
  9 - start Node-B
  10 - attempt to activate cluster

/PROBLEM: Freezes - see below:

Activating the Apache Ignite cluster:
Control utility [ver. 2.4.0#20180305-sha1:aa342270]
2018 Copyright(C) Apache Software Foundation
User: jose
--------------------------------------------------------------------------------
/
Additional notes:
- All data is lost and cannot be recovered.
- This did not occur with Ignite 2.3.0, although data consistency was
unpredictable but would sometimes align after some period of time.
- If Node-A (with data) is started before Node-B (new instance, empty data),
both nodes start, but cluster fails to activate. The following error is
observed on Node-A ouput:

/Activating the Apache Ignite cluster:
Control utility [ver. 2.4.0#20180305-sha1:aa342270]
2018 Copyright(C) Apache Software Foundation
User: jose
--------------------------------------------------------------------------------
Failed to activate cluster.
Connection to cluster failed.
Error: Failed to perform request (connection failed): /192.168.1.230:11211

[11:37:36,455][SEVERE][exchange-worker-#36][GridDhtPartitionsExchangeFuture]
Failed to activate node components
[nodeId=4796b18e-5e55-4abc-bea6-1d060363079f, client=false,
topVer=AffinityTopologyVersion [topVer=2, minorTopVer=1]]
class org.apache.ignite.IgniteCheckedException: Failed to find cache group
descriptor [grpId=-2066691984]
        at
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.getPageMemoryForCacheGroup(GridCacheDatabaseSharedManager.java:2019)
        at
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restoreMemory(GridCacheDatabaseSharedManager.java:1957)
        at
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restoreMemory(GridCacheDatabaseSharedManager.java:1827)
        at
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readCheckpointAndRestoreMemory(GridCacheDatabaseSharedManager.java:725)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onClusterStateChangeRequest(GridDhtPartitionsExchangeFuture.java:844)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:596)
        at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2337)
        at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
        at java.lang.Thread.run(Thread.java:748)
[11:37:37,261][SEVERE][sys-#46][GridDhtPartitionsExchangeFuture] Failed to
notify listener:
o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$2@532d7b05
java.lang.NullPointerException
        at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.clientTopology(GridCachePartitionExchangeManager.java:765)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.updatePartitionSingleMap(GridDhtPartitionsExchangeFuture.java:3021)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.processSingleMessage(GridDhtPartitionsExchangeFuture.java:2053)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.access$100(GridDhtPartitionsExchangeFuture.java:124)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$2.apply(GridDhtPartitionsExchangeFuture.java:1928)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$2.apply(GridDhtPartitionsExchangeFuture.java:1916)
        at
org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:383)
        at
org.apache.ignite.internal.util.future.GridFutureAdapter.listen(GridFutureAdapter.java:353)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onReceiveSingleMessage(GridDhtPartitionsExchangeFuture.java:1916)
        at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.processSinglePartitionUpdate(GridCachePartitionExchangeManager.java:1531)
        at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.access$1000(GridCachePartitionExchangeManager.java:133)
        at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$2.onMessage(GridCachePartitionExchangeManager.java:332)
        at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$2.onMessage(GridCachePartitionExchangeManager.java:312)
        at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$MessageHandler.apply(GridCachePartitionExchangeManager.java:2689)
        at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$MessageHandler.apply(GridCachePartitionExchangeManager.java:2668)
        at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1060)
        at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:579)
        at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:378)
        at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:304)
        at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:99)
        at
org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:293)
        at
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1555)
        at
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1183)
        at
org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:126)
        at
org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1090)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
[11:37:37,275][SEVERE][sys-#46][GridCacheIoManager] Failed processing
message [senderId=f5ba8cfd-dc92-48d1-a28f-402c4a60186c,
msg=GridDhtPartitionsSingleMessage [parts={-2100569601=GridDhtPartitionMap
[moving=0, top=AffinityTopologyVersion [topVer=2, minorTopVer=1],
updateSeq=2, size=0], -1853271209=GridDhtPartitionMap [moving=0,
top=AffinityTopologyVersion [topVer=2, minorTopVer=1], updateSeq=2,
size=0]}, partCntrs={-2100569601=CachePartitionPartialCountersMap {},
-1853271209=CachePartitionPartialCountersMap {}}, partHistCntrs=null,
err=null, client=false, compress=false, finishMsg=null,
super=GridDhtPartitionsAbstractMessage [exchId=GridDhtPartitionExchangeId
[topVer=AffinityTopologyVersion [topVer=2, minorTopVer=1], discoEvt=null,
nodeId=f5ba8cfd, evt=DISCOVERY_CUSTOM_EVT], lastVer=GridCacheVersion
[topVer=0, order=1521538636418, nodeOrder=0], super=GridCacheMessage
[msgId=1, depInfo=null, err=null, skipPrepare=false]]]]
java.lang.NullPointerException
        at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.clientTopology(GridCachePartitionExchangeManager.java:765)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.updatePartitionSingleMap(GridDhtPartitionsExchangeFuture.java:3021)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.processSingleMessage(GridDhtPartitionsExchangeFuture.java:2053)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.access$100(GridDhtPartitionsExchangeFuture.java:124)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$2.apply(GridDhtPartitionsExchangeFuture.java:1928)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$2.apply(GridDhtPartitionsExchangeFuture.java:1916)
        at
org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:383)
        at
org.apache.ignite.internal.util.future.GridFutureAdapter.listen(GridFutureAdapter.java:353)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onReceiveSingleMessage(GridDhtPartitionsExchangeFuture.java:1916)
        at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.processSinglePartitionUpdate(GridCachePartitionExchangeManager.java:1531)
        at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.access$1000(GridCachePartitionExchangeManager.java:133)
        at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$2.onMessage(GridCachePartitionExchangeManager.java:332)
        at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$2.onMessage(GridCachePartitionExchangeManager.java:312)
        at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$MessageHandler.apply(GridCachePartitionExchangeManager.java:2689)
        at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$MessageHandler.apply(GridCachePartitionExchangeManager.java:2668)
        at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1060)
        at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:579)
        at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:378)
        at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:304)
        at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:99)
        at
org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:293)
        at
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1555)
        at
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1183)
        at
org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:126)
        at
org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1090)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)/

=================================================================

*Scenario B: Primary (Node-A) Failure*

Environment: 
  - 2 nodes (Node-A, Node-B)
  - Ignite native persistence enabled
  - static IP discovery - both node IPs listed
  - JDBC (Client) - DBeaver
  - manual cluster activation

Steps:
  1 - start  both nodes with no data
  2 - activate cluster on same machine as Node-A
  3 - load data via SQL JDBC (...WITH template=replicated, backups=1)
  4 - simulate power-failure ... all components down; Node-A with
unrecoverable damage
  5 - start new instance of Node-A (with no data)
  6 - attempt to start Node-B (undamaged, with data)... 

/PROBLEM: Unable to start Node-B. Error follows:

[11:52:44,218][SEVERE][main][IgniteKernal] Failed to start manager:
GridManagerAdapter [enabled=true,
name=o.a.i.i.managers.discovery.GridDiscoveryManager]
class org.apache.ignite.IgniteCheckedException: Failed to start SPI:
TcpDiscoverySpi [addrRslvr=null, sockTimeout=5000, ackTimeout=5000,
marsh=JdkMarshaller
[clsFilter=org.apache.ignite.internal.IgniteKernal$5@cc6460c], reconCnt=10,
reconDelay=2000, maxAckTimeout=600000, forceSrvMode=false,
clientReconnectDisabled=false]
        at
org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:300)
        at
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.start(GridDiscoveryManager.java:892)
        at
org.apache.ignite.internal.IgniteKernal.startManager(IgniteKernal.java:1669)
        at
org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:983)
        at
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:1973)
        at
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1716)
        at
org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1144)
        at
org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1062)
        at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:948)
        at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:847)
        at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:717)
        at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:686)
        at org.apache.ignite.Ignition.start(Ignition.java:347)
        at
org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:302)
Caused by: class org.apache.ignite.spi.IgniteSpiException: Node with set up
BaselineTopology is not allowed to join cluster without one:
521d42cc-a827-4eb4-b1f1-8c811fc92b61
        at
org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.checkFailedError(TcpDiscoverySpi.java:1856)
        at
org.apache.ignite.spi.discovery.tcp.ServerImpl.joinTopology(ServerImpl.java:932)
        at
org.apache.ignite.spi.discovery.tcp.ServerImpl.spiStart(ServerImpl.java:364)
        at
org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.spiStart(TcpDiscoverySpi.java:1930)
        at
org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:297)
        ... 13 more
[11:52:44,223][SEVERE][main][IgniteKernal] Got exception while starting
(will rollback startup routine).
class org.apache.ignite.IgniteCheckedException: Failed to start manager:
GridManagerAdapter [enabled=true,
name=org.apache.ignite.internal.managers.discovery.GridDiscoveryManager]
        at
org.apache.ignite.internal.IgniteKernal.startManager(IgniteKernal.java:1674)
        at
org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:983)
        at
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:1973)
        at
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1716)
        at
org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1144)
        at
org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1062)
        at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:948)
        at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:847)
        at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:717)
        at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:686)
        at org.apache.ignite.Ignition.start(Ignition.java:347)
        at
org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:302)
Caused by: class org.apache.ignite.IgniteCheckedException: Failed to start
SPI: TcpDiscoverySpi [addrRslvr=null, sockTimeout=5000, ackTimeout=5000,
marsh=JdkMarshaller
[clsFilter=org.apache.ignite.internal.IgniteKernal$5@cc6460c], reconCnt=10,
reconDelay=2000, maxAckTimeout=600000, forceSrvMode=false,
clientReconnectDisabled=false]
        at
org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:300)
        at
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.start(GridDiscoveryManager.java:892)
        at
org.apache.ignite.internal.IgniteKernal.startManager(IgniteKernal.java:1669)
        ... 11 more
Caused by: class org.apache.ignite.spi.IgniteSpiException: Node with set up
BaselineTopology is not allowed to join cluster without one:
521d42cc-a827-4eb4-b1f1-8c811fc92b61
        at
org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.checkFailedError(TcpDiscoverySpi.java:1856)
        at
org.apache.ignite.spi.discovery.tcp.ServerImpl.joinTopology(ServerImpl.java:932)
        at
org.apache.ignite.spi.discovery.tcp.ServerImpl.spiStart(ServerImpl.java:364)
        at
org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.spiStart(TcpDiscoverySpi.java:1930)
        at
org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:297)
        ... 13 more
[11:52:49] Ignite node stopped OK [uptime=00:00:09.751]
class org.apache.ignite.IgniteException: Failed to start manager:
GridManagerAdapter [enabled=true,
name=org.apache.ignite.internal.managers.discovery.GridDiscoveryManager]
        at
org.apache.ignite.internal.util.IgniteUtils.convertException(IgniteUtils.java:980)
        at org.apache.ignite.Ignition.start(Ignition.java:350)
        at
org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:302)
Caused by: class org.apache.ignite.IgniteCheckedException: Failed to start
manager: GridManagerAdapter [enabled=true,
name=org.apache.ignite.internal.managers.discovery.GridDiscoveryManager]
        at
org.apache.ignite.internal.IgniteKernal.startManager(IgniteKernal.java:1674)
        at
org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:983)
        at
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:1973)
        at
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1716)
        at
org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1144)
        at
org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1062)
        at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:948)
        at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:847)
        at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:717)
        at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:686)
        at org.apache.ignite.Ignition.start(Ignition.java:347)
        ... 1 more
Caused by: class org.apache.ignite.IgniteCheckedException: Failed to start
SPI: TcpDiscoverySpi [addrRslvr=null, sockTimeout=5000, ackTimeout=5000,
marsh=JdkMarshaller
[clsFilter=org.apache.ignite.internal.IgniteKernal$5@cc6460c], reconCnt=10,
reconDelay=2000, maxAckTimeout=600000, forceSrvMode=false,
clientReconnectDisabled=false]
        at
org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:300)
        at
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.start(GridDiscoveryManager.java:892)
        at
org.apache.ignite.internal.IgniteKernal.startManager(IgniteKernal.java:1669)
        ... 11 more
Caused by: class org.apache.ignite.spi.IgniteSpiException: Node with set up
BaselineTopology is not allowed to join cluster without one:
521d42cc-a827-4eb4-b1f1-8c811fc92b61
        at
org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.checkFailedError(TcpDiscoverySpi.java:1856)
        at
org.apache.ignite.spi.discovery.tcp.ServerImpl.joinTopology(ServerImpl.java:932)
        at
org.apache.ignite.spi.discovery.tcp.ServerImpl.spiStart(ServerImpl.java:364)
        at
org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.spiStart(TcpDiscoverySpi.java:1930)
        at
org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:297)
        ... 13 more
Failed to start grid: Failed to start manager: GridManagerAdapter
[enabled=true,
name=org.apache.ignite.internal.managers.discovery.GridDiscoveryManager]/

  7 - stop Node-A
  8 - start Node-B
  9 - start Node-A
  10 - attempt to activate cluster

/PROBLEM: Cluster activation fails. The following error is observed on
Node-B output:

[11:55:48,868][SEVERE][exchange-worker-#36][GridDhtPartitionsExchangeFuture]
Failed to activate node components
[nodeId=f7cfdfa5-f0e9-4bda-8e79-842cd5d35ebb, client=false,
topVer=AffinityTopologyVersion [topVer=2, minorTopVer=1]]
class org.apache.ignite.IgniteCheckedException: Failed to find cache group
descriptor [grpId=-2066691984]
        at
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.getPageMemoryForCacheGroup(GridCacheDatabaseSharedManager.java:2019)
        at
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restoreMemory(GridCacheDatabaseSharedManager.java:1957)
        at
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restoreMemory(GridCacheDatabaseSharedManager.java:1827)
        at
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readCheckpointAndRestoreMemory(GridCacheDatabaseSharedManager.java:725)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onClusterStateChangeRequest(GridDhtPartitionsExchangeFuture.java:844)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:596)
        at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2337)
        at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
        at java.lang.Thread.run(Thread.java:748)
[11:55:50,060][SEVERE][sys-#42][GridDhtPartitionsExchangeFuture] Failed to
notify listener:
o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$2@5b374f7
java.lang.NullPointerException
        at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.clientTopology(GridCachePartitionExchangeManager.java:765)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.updatePartitionSingleMap(GridDhtPartitionsExchangeFuture.java:3021)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.processSingleMessage(GridDhtPartitionsExchangeFuture.java:2053)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.access$100(GridDhtPartitionsExchangeFuture.java:124)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$2.apply(GridDhtPartitionsExchangeFuture.java:1928)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$2.apply(GridDhtPartitionsExchangeFuture.java:1916)
        at
org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:383)
        at
org.apache.ignite.internal.util.future.GridFutureAdapter.listen(GridFutureAdapter.java:353)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onReceiveSingleMessage(GridDhtPartitionsExchangeFuture.java:1916)
        at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.processSinglePartitionUpdate(GridCachePartitionExchangeManager.java:1531)
        at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.access$1000(GridCachePartitionExchangeManager.java:133)
        at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$2.onMessage(GridCachePartitionExchangeManager.java:332)
        at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$2.onMessage(GridCachePartitionExchangeManager.java:312)
        at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$MessageHandler.apply(GridCachePartitionExchangeManager.java:2689)
        at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$MessageHandler.apply(GridCachePartitionExchangeManager.java:2668)
        at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1060)
        at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:579)
        at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:378)
        at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:304)
        at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:99)
        at
org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:293)
        at
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1555)
        at
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1183)
        at
org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:126)
        at
org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1090)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
[11:55:50,073][SEVERE][sys-#42][GridCacheIoManager] Failed processing
message [senderId=da1c1c02-095f-4e5e-8eb6-39a754422b82,
msg=GridDhtPartitionsSingleMessage [parts={-2100569601=GridDhtPartitionMap
[moving=0, top=AffinityTopologyVersion [topVer=2, minorTopVer=1],
updateSeq=2, size=0], -1853271209=GridDhtPartitionMap [moving=0,
top=AffinityTopologyVersion [topVer=2, minorTopVer=1], updateSeq=2,
size=0]}, partCntrs={-2100569601=CachePartitionPartialCountersMap {},
-1853271209=CachePartitionPartialCountersMap {}}, partHistCntrs=null,
err=null, client=false, compress=false, finishMsg=null,
super=GridDhtPartitionsAbstractMessage [exchId=GridDhtPartitionExchangeId
[topVer=AffinityTopologyVersion [topVer=2, minorTopVer=1], discoEvt=null,
nodeId=da1c1c02, evt=DISCOVERY_CUSTOM_EVT], lastVer=GridCacheVersion
[topVer=0, order=1521539726554, nodeOrder=0], super=GridCacheMessage
[msgId=1, depInfo=null, err=null, skipPrepare=false]]]]
java.lang.NullPointerException
        at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.clientTopology(GridCachePartitionExchangeManager.java:765)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.updatePartitionSingleMap(GridDhtPartitionsExchangeFuture.java:3021)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.processSingleMessage(GridDhtPartitionsExchangeFuture.java:2053)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.access$100(GridDhtPartitionsExchangeFuture.java:124)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$2.apply(GridDhtPartitionsExchangeFuture.java:1928)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$2.apply(GridDhtPartitionsExchangeFuture.java:1916)
        at
org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:383)
        at
org.apache.ignite.internal.util.future.GridFutureAdapter.listen(GridFutureAdapter.java:353)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onReceiveSingleMessage(GridDhtPartitionsExchangeFuture.java:1916)
        at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.processSinglePartitionUpdate(GridCachePartitionExchangeManager.java:1531)
        at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.access$1000(GridCachePartitionExchangeManager.java:133)
        at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$2.onMessage(GridCachePartitionExchangeManager.java:332)
        at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$2.onMessage(GridCachePartitionExchangeManager.java:312)
        at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$MessageHandler.apply(GridCachePartitionExchangeManager.java:2689)
        at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$MessageHandler.apply(GridCachePartitionExchangeManager.java:2668)
        at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1060)
        at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:579)
        at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:378)
        at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:304)
        at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:99)
        at
org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:293)
        at
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1555)
        at
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1183)
        at
org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:126)
        at
org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1090)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)/

Additional notes:
- All data is lost and cannot be recovered.
- This did not occur with Ignite 2.3.0, although data consistency was
unpredictable but would sometimes align after some period of time.
- If Node-B (with data) is started before Node-A (new instance, empty data),
both nodes start, cluster can be activated successfully. Data is intact.




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: [2.4.0] Cluster unrecoverable after node failure

Posted by Pavel Vinokurov <vi...@gmail.com>.
Hi Jose,

>>
Caused by: class org.apache.ignite.spi.IgniteSpiException: Node with set up
BaselineTopology is not allowed to join cluster without one
>>

That is expected behavior. The "Crashed" node has the empty baseline. Nodes
with a old baseline could not connect to the cluster with empty baseline
topology.
So the node with data should start first.

Baseline topology described in following pages:
https://apacheignite.readme.io/docs/cluster-activation#section-baseline-topology
https://cwiki.apache.org/confluence/display/IGNITE/IEP-4+Baseline+topology+for+caches


Thanks,
Pavel

2018-03-24 17:59 GMT+03:00 joseheitor <jo...@heitorprojects.com>:

> Thanks Arseny - I really appreciate your assistance!
>
> The config files that I use are included in the attached archive.
> ignite-replicated.zip
> <http://apache-ignite-users.70518.x6.nabble.com/file/
> t1652/ignite-replicated.zip>
>
> Let me know if you need anything else, or any clarification?
>
> Below (for your reference) the SQL commands that I use, to populate the
> database with test data:
>
> DROP TABLE PUBLIC.Person
> DROP TABLE PUBLIC.City
>
> CREATE TABLE PUBLIC.City (
>   id LONG PRIMARY KEY, name VARCHAR)
>   WITH "TEMPLATE=REPLICATED, BACKUPS=1, ATOMICITY=TRANSACTIONAL,
> WRITE_SYNCHRONIZATION_MODE=FULL_SYNC"
>
> CREATE TABLE PUBLIC.Person (
>   id LONG, name VARCHAR, city_id LONG, PRIMARY KEY (id, city_id))
>   WITH "TEMPLATE=REPLICATED, BACKUPS=1, ATOMICITY=TRANSACTIONAL,
> WRITE_SYNCHRONIZATION_MODE=FULL_SYNC"
>
> INSERT INTO PUBLIC.City (id, name) VALUES (1, 'Forest Hill')
> INSERT INTO PUBLIC.City (id, name) VALUES (2, 'Denver')
> INSERT INTO PUBLIC.City (id, name) VALUES (3, 'St. Petersburg')
> INSERT INTO PUBLIC.Person (id, name, city_id) VALUES (1, 'John Doe', 3)
> INSERT INTO PUBLIC.Person (id, name, city_id) VALUES (2, 'Jane Roe', 2)
> INSERT INTO PUBLIC.Person (id, name, city_id) VALUES (3, 'Mary Major', 1)
> INSERT INTO PUBLIC.Person (id, name, city_id) VALUES (4, 'Richard Miles',
> 2)
>
> SELECT p.name, c.name
> FROM PUBLIC.Person p, PUBLIC.City c
> WHERE p.city_id = c.id AND c.name = 'Denver'
>
> SELECT COUNT(*) FROM PUBLIC.Person
> SELECT COUNT(*) FROM PUBLIC.City
>
> DELETE FROM PUBLIC.Person WHERE name = 'Jane Roe'
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>



-- 

Regards

Pavel Vinokurov

Re: [2.4.0] Cluster unrecoverable after node failure

Posted by joseheitor <jo...@heitorprojects.com>.
Thanks Arseny - I really appreciate your assistance!

The config files that I use are included in the attached archive.
ignite-replicated.zip
<http://apache-ignite-users.70518.x6.nabble.com/file/t1652/ignite-replicated.zip>  

Let me know if you need anything else, or any clarification?

Below (for your reference) the SQL commands that I use, to populate the
database with test data:

DROP TABLE PUBLIC.Person
DROP TABLE PUBLIC.City

CREATE TABLE PUBLIC.City (
  id LONG PRIMARY KEY, name VARCHAR)
  WITH "TEMPLATE=REPLICATED, BACKUPS=1, ATOMICITY=TRANSACTIONAL,
WRITE_SYNCHRONIZATION_MODE=FULL_SYNC"

CREATE TABLE PUBLIC.Person (
  id LONG, name VARCHAR, city_id LONG, PRIMARY KEY (id, city_id))
  WITH "TEMPLATE=REPLICATED, BACKUPS=1, ATOMICITY=TRANSACTIONAL,
WRITE_SYNCHRONIZATION_MODE=FULL_SYNC"
 
INSERT INTO PUBLIC.City (id, name) VALUES (1, 'Forest Hill')
INSERT INTO PUBLIC.City (id, name) VALUES (2, 'Denver')
INSERT INTO PUBLIC.City (id, name) VALUES (3, 'St. Petersburg')
INSERT INTO PUBLIC.Person (id, name, city_id) VALUES (1, 'John Doe', 3)
INSERT INTO PUBLIC.Person (id, name, city_id) VALUES (2, 'Jane Roe', 2)
INSERT INTO PUBLIC.Person (id, name, city_id) VALUES (3, 'Mary Major', 1)
INSERT INTO PUBLIC.Person (id, name, city_id) VALUES (4, 'Richard Miles', 2)

SELECT p.name, c.name
FROM PUBLIC.Person p, PUBLIC.City c
WHERE p.city_id = c.id AND c.name = 'Denver'

SELECT COUNT(*) FROM PUBLIC.Person
SELECT COUNT(*) FROM PUBLIC.City

DELETE FROM PUBLIC.Person WHERE name = 'Jane Roe'



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: [2.4.0] Cluster unrecoverable after node failure

Posted by Arseny Kovalchuk <ar...@synesis.ru>.
Hi Jose.

Did you test from scratch? I mean after updating a config file you should
start from the clean cluster (wipe all data from all nodes), the cluster
should remember its new baseline topology with new node ids. And only then
try to repeat the test with one node cleaning.

I cannot promise to reproduce right now, too much work, but it looks like
we also can potentially get such scenario in our environment, so when I get
some time I'll try. Please share configuration files from all nodes zipped
and attached here or via GitHub.


​
Arseny Kovalchuk

Senior Software Engineer at Synesis
skype: arseny.kovalchuk
mobile: +375 (29) 666-16-16
​LinkedIn Profile <http://www.linkedin.com/in/arsenykovalchuk/en>​

On 23 March 2018 at 13:44, joseheitor <jo...@heitorprojects.com> wrote:

> Hi Arseny,
>
> Regrettably still experiencing the same results.
>
> Could there be something else that I am overlooking? My configurations are
> quite basic - I can post them, if it will be helpful for you to duplicate
> the issue...
>
> Thanks
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: [2.4.0] Cluster unrecoverable after node failure

Posted by joseheitor <jo...@heitorprojects.com>.
Hi Arseny,

Regrettably still experiencing the same results.

Could there be something else that I am overlooking? My configurations are
quite basic - I can post them, if it will be helpful for you to duplicate
the issue...

Thanks



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: [2.4.0] Cluster unrecoverable after node failure

Posted by Arseny Kovalchuk <ar...@synesis.ru>.
Yes it can be set in a configuration file. You can do it like in the
example below (please do not just copy-paste, put your own values for
properties). Check your configuration file and find a bean with
class org.apache.ignite.configuration.IgniteConfiguration, then add a
property tag to that bean.

See *consistentId* property, in this example it is set from environment,
but you can provide any string value there like node-a, node-b, etc.

<bean id="igniteConfig" class=
"org.apache.ignite.configuration.IgniteConfiguration">
<!-- <property name="igniteInstanceName" value="#{
systemEnvironment['IGNITE_HOSTNAME'] }" /> -->
<property name="localHost" value="#{ systemEnvironment['IGNITE_POD_IP'] }"/>
<property name="peerClassLoadingEnabled" value="false" />
<property name="metricsLogFrequency" value="0" />
<property name="clientFailureDetectionTimeout" value="#{
systemEnvironment['IGNITE_FAILURE_DETECTION_TIMEOUT']?: 1 * 60 * 1000 }" />
<property name="failureDetectionTimeout" value="#{
systemEnvironment['IGNITE_FAILURE_DETECTION_TIMEOUT']?: 1 * 60 * 1000 }" />
<property name="workDirectory" value="/ignite-work-directory"/>
<property name="consistentId" value="#{
systemEnvironment['IGNITE_CONSISTENT_ID'] }" />
<!-- set pool sizes because in k8s env
Runtime.getRuntime().availableProcessors() returns incorrect value -->
<property name="systemThreadPoolSize" value="8" />
<property name="publicThreadPoolSize" value="8" />
<property name="queryThreadPoolSize" value="8" />
<property name="serviceThreadPoolSize" value="8" />
<property name="stripedPoolSize" value="8" />
<property name="dataStreamerThreadPoolSize" value="8" />
<property name="asyncCallbackPoolSize" value="8" />
<property name="managementThreadPoolSize" value="4" />
<property name="peerClassLoadingThreadPoolSize" value="4" />
<property name="igfsThreadPoolSize" value="4" />
<property name="utilityCachePoolSize" value="4" />
<property name="connectorConfiguration">
<bean class="org.apache.ignite.configuration.ConnectorConfiguration">
<property name="selectorCount" value="4" />
<property name="threadPoolSize" value="8" />
</bean>
</property>

<property name="gridLogger">
<bean class="org.apache.ignite.logger.slf4j.Slf4jLogger" />
</property>

<property name="dataStorageConfiguration" ref="dataStorageConfiguration" />

<!-- depends on profile! -->
<property name="discoverySpi" ref="discoverySpi" />
<property name="communicationSpi" ref="communicationSpi" />
</bean>


Arseny



​
Arseny Kovalchuk

Senior Software Engineer at Synesis
skype: arseny.kovalchuk
mobile: +375 (29) 666-16-16
​LinkedIn Profile <http://www.linkedin.com/in/arsenykovalchuk/en>​

On 23 March 2018 at 12:45, joseheitor <jo...@heitorprojects.com> wrote:

> Hi Arseny,
>
> Can this be set in the configuration file for each node? (like a property?)
>
> Our application is aiming (currently) to use Ignite purely as a
> distributed,
> persistent and cached, fault-tolerant SQL database through the Client JDBC
> driver. It does/must not instantiate (depend on) any Ignite library
> components directly in the application code.
>
> Thanks
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: [2.4.0] Cluster unrecoverable after node failure

Posted by joseheitor <jo...@heitorprojects.com>.
Hi Arseny,

Can this be set in the configuration file for each node? (like a property?)

Our application is aiming (currently) to use Ignite purely as a distributed,
persistent and cached, fault-tolerant SQL database through the Client JDBC
driver. It does/must not instantiate (depend on) any Ignite library
components directly in the application code.

Thanks



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: [2.4.0] Cluster unrecoverable after node failure

Posted by Arseny Kovalchuk <ar...@synesis.ru>.
Hi Jose.

What if you set IgniteConfiguration#setConsistentId for each node
explicitly. Say node-a, node-b, etc.



​
Arseny Kovalchuk

Senior Software Engineer at Synesis
skype: arseny.kovalchuk
mobile: +375 (29) 666-16-16
​LinkedIn Profile <http://www.linkedin.com/in/arsenykovalchuk/en>​

On 22 March 2018 at 18:34, joseheitor <jo...@heitorprojects.com> wrote:

> Hi Pavel,
>
>
> 1. Disconnect database connection
> 2. Stop all component processes on all nodes (Ctl+C)
> 3. I delete the 'work' folder on the node on which I want to simulate an
> unrecoverable hardware failure.
>
> When the node is started up anew - it is like deploying a new instance on
> the same IP address... (without any data).
>
> Jose
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: [2.4.0] Cluster unrecoverable after node failure

Posted by joseheitor <jo...@heitorprojects.com>.
Hi Pavel,


1. Disconnect database connection
2. Stop all component processes on all nodes (Ctl+C)
3. I delete the 'work' folder on the node on which I want to simulate an
unrecoverable hardware failure.

When the node is started up anew - it is like deploying a new instance on
the same IP address... (without any data).

Jose



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: [2.4.0] Cluster unrecoverable after node failure

Posted by Pavel Vinokurov <vi...@gmail.com>.
Hi,

>>
  4 - simulate power-failure ... all components down; Node-B with
unrecoverable damage (hardware)
>>

How do you emulate failure with unrecoverable damage ?

2018-03-22 14:34 GMT+03:00 joseheitor <jo...@heitorprojects.com>:

> I do apologise for the long-winded post earlier (with error stack-traces,
> etc.).
>
> And hope that someone can assist me with this issue - it is a basic,
> real-world scenario that tests the fundamental integrity of the clustering
> system!
>
> Am I perhaps missing something? Or mismanaging the cluster in such an
> occurrence? What is the 'best-practice' to recover from such a scenario?
>
> Here is the condensed version of the problem, which is hopefully easier to
> read (without the stack-traces):
>
> *Scenario: Secondary (Node-B) Failure*
>
> Environment:
>   - 2 nodes (Node-A, Node-B)
>   - Ignite native persistence enabled
>   - static IP discovery - both node IPs listed
>   - JDBC (Client) - DBeaver
>   - manual cluster activation
>
> Steps:
>   1 - start  both nodes with no data
>   2 - activate cluster on same machine as Node-A
>   3 - load data via SQL JDBC (...WITH template=replicated, backups=1)
>   4 - simulate power-failure ... all components down; Node-B with
> unrecoverable damage (hardware)
>   5 - start new Node-B instance (with no data)
>   6 - attempt to start Node-A (undamaged, with good data)...
>
> PROBLEM: Unable to start Node-A. (Error in previous post below...)
>
> In an attempt to recover the cluster and data:
>   7 - stop Node-B
>   8 - start Node-A - first
>   9 - start Node-B (starts)
>   10 - attempt to activate cluster
>
> PROBLEM: Cluster activation operation Freezes.
>
> Additional notes:
> - All data is lost and cannot be recovered.
> - This did not occur with Ignite 2.3.0, although data consistency was
> unpredictable but would sometimes align after some period of time.
> - If Node-A (with data) is started before Node-B (new instance, empty
> data),
> both nodes start, but cluster fails to activate. (See below post for
> details
> of the error observed on Node-A ouput)
>
> ...
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>



-- 

Regards

Pavel Vinokurov

Re: [2.4.0] Cluster unrecoverable after node failure

Posted by joseheitor <jo...@heitorprojects.com>.
I do apologise for the long-winded post earlier (with error stack-traces,
etc.).

And hope that someone can assist me with this issue - it is a basic,
real-world scenario that tests the fundamental integrity of the clustering
system!

Am I perhaps missing something? Or mismanaging the cluster in such an
occurrence? What is the 'best-practice' to recover from such a scenario?

Here is the condensed version of the problem, which is hopefully easier to
read (without the stack-traces):

*Scenario: Secondary (Node-B) Failure*

Environment:
  - 2 nodes (Node-A, Node-B)
  - Ignite native persistence enabled
  - static IP discovery - both node IPs listed
  - JDBC (Client) - DBeaver
  - manual cluster activation

Steps:
  1 - start  both nodes with no data
  2 - activate cluster on same machine as Node-A
  3 - load data via SQL JDBC (...WITH template=replicated, backups=1)
  4 - simulate power-failure ... all components down; Node-B with
unrecoverable damage (hardware)
  5 - start new Node-B instance (with no data)
  6 - attempt to start Node-A (undamaged, with good data)...

PROBLEM: Unable to start Node-A. (Error in previous post below...) 

In an attempt to recover the cluster and data:
  7 - stop Node-B
  8 - start Node-A - first
  9 - start Node-B (starts)
  10 - attempt to activate cluster

PROBLEM: Cluster activation operation Freezes.

Additional notes:
- All data is lost and cannot be recovered.
- This did not occur with Ignite 2.3.0, although data consistency was
unpredictable but would sometimes align after some period of time.
- If Node-A (with data) is started before Node-B (new instance, empty data),
both nodes start, but cluster fails to activate. (See below post for details
of the error observed on Node-A ouput)

...



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/