You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by Usman Waheed <us...@gmail.com> on 2018/01/01 11:20:50 UTC

Re: Ignite Cluster Error

Logs from 2 node ignite cluster attached.
​
 node1.tar
<https://drive.google.com/file/d/1Agtkn4v2Q2EAa65tgYj-CHSa-qI2kIUf/view?usp=drive_web>
​​
 node2.tar
<https://drive.google.com/file/d/1b2sI6Ojk60VqYPj-bGODKyYMW6GpdaOu/view?usp=drive_web>
​

On Fri, Dec 29, 2017 at 1:10 PM, Usman Waheed <us...@gmail.com>
wrote:

> Thanks Evgenii , will get back to this thread.
>
> On Fri, Dec 29, 2017 at 1:09 PM, Evgenii Zhuravlev <
> e.zhuravlev.wk@gmail.com> wrote:
>
>> Please provide logs from all nodes with -DIGNITE_QUIET=false for
>> investigation
>>
>> Evgenii
>>
>> 2017-12-29 11:03 GMT+03:00 Usman Waheed <us...@gmail.com>:
>>
>>> Correction at my end, we increased the timeout's to see if it helps to
>>> resolve our problem but no luck.
>>> So we can set it back to the default settings.
>>>
>>> I am also pasting some more settings:
>>>
>>> While searching for a resolution, i stumbled upon:
>>> https://issues.apache.org/jira/browse/IGNITE-6555 which i don't think
>>> is related to my problem.
>>>
>>>
>>>    - <property name="memoryConfiguration">
>>>    <bean class="org.apache.ignite.configuration.MemoryConfiguration">
>>>    <!-- Set the size of default memory region to 4GB. -->
>>>    <property name="defaultMemoryPolicySize" value="# {400L * 1024 *
>>>    1024 * 1024}
>>>
>>>    "/>
>>>    </bean>
>>>    </property>
>>>    <property name="communicationSpi">
>>>    <bean class="org.apache.ignite.spi.communication.tcp.TcpCommunicat
>>>    ionSpi">
>>>    <property name="messageQueueLimit" value="1024"/>
>>>    <property name="slowClientQueueLimit" value="512"/>
>>>    <property name="idleConnectionTimeout" value="3600000"/>
>>>    <property name="sharedMemoryPort" value="-1"/>
>>>
>>> </bean>
>>> </property>
>>> <property name="socketTimeout" value="600000"/>
>>> <property name="networkTimeout" value="600000"/>
>>> <property name="joinTimeout" value="600000" />
>>> <property name="ackTimeout" value="50000" />
>>> <property name="statisticsPrintFrequency" value="20000" />
>>>
>>> On Fri, Dec 29, 2017 at 12:42 PM, Evgenii Zhuravlev <
>>> e.zhuravlev.wk@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> Why did you set so big timeouts? Why don't default timeouts work for
>>>> you?
>>>>
>>>> Evgenii
>>>>
>>>> 2017-12-29 10:35 GMT+03:00 Usman Waheed <us...@gmail.com>:
>>>>
>>>>> Hi,
>>>>>
>>>>> We have deployed apache ignite fabric 2.3
>>>>>
>>>>> We get the below error when trying to run on more than 1 node.
>>>>>
>>>>>  GridTimeoutProcessor: Timeout has occurred: CancelableTask
>>>>> [id=970ee7b2061-c1565aa4-510c-4046-9ebb-46efd861b4df,
>>>>> endTime=1512558510454, period=5000, cancel=false,
>>>>> task=org.apache.ignite.internal.processors.cache.query.conti
>>>>> nuous.CacheContinuousQueryManager$BackupCleaner@2c898c3e]
>>>>>
>>>>>
>>>>>
>>>>> Code is running fine on one node when ever new node joins it is gives
>>>>> above error. We are using the below properties for making the cluster. Any
>>>>> pointers or help will be much appreciated.
>>>>>
>>>>>
>>>>>
>>>>>         <property name="discoverySpi">
>>>>>
>>>>>             <bean class="org.apache.ignite.spi.d
>>>>> iscovery.tcp.TcpDiscoverySpi">
>>>>>
>>>>>                 <property name="socketTimeout" value="600000"/>
>>>>>
>>>>>                 <property name="networkTimeout" value="600000"/>
>>>>>
>>>>>                                 <property name="joinTimeout"
>>>>> value="600000" />
>>>>>
>>>>>                                 <property name="ackTimeout"
>>>>> value="50000" />
>>>>>
>>>>>                                 <property
>>>>> name="statisticsPrintFrequency" value="20000" />
>>>>>
>>>>>
>>>>>
>>>>>                 <property name="ipFinder">
>>>>>
>>>>>                                 <bean class="org.apache.ignite.spi.d
>>>>> iscovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">
>>>>>
>>>>>                        <property name="addresses">
>>>>>
>>>>>                             <list>
>>>>>
>>>>>                 <value>localIP:47500..47509</value>
>>>>>
>>>>>
>>>>>
>>>>>                 <value>remoteIP:47500..47509</value>
>>>>>
>>>>>                             </list>
>>>>>
>>>>>                         </property>
>>>>>
>>>>>                     </bean>
>>>>>
>>>>>                 </property>
>>>>>
>>>>>             </bean>
>>>>>
>>>>>         </property>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Ignite Cluster Error

Posted by Usman Waheed <us...@gmail.com>.
Hi Denis,

I did not see any other errors but let me dig more and will share.

Best Regards,
Usman



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Ignite Cluster Error

Posted by Usman Waheed <us...@gmail.com>.
Yes Sir, that exception is from my code that has caused ignite2 (second
ignite node) to go offline :)
Investigating now ...



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Ignite Cluster Error

Posted by Denis Mekhanikov <dm...@gmail.com>.
Usman,

This exception is thrown from your code, so you will have to deal with it
yourself :)

I would also recommend you to reset all timeouts to default values. They
seem to be too high.
As Ignite suggests you in log, failure detection will work faster, if you
don't assign such big values to timeout properties.

Denis

вт, 2 янв. 2018 г. в 14:11, Usman Waheed <us...@gmail.com>:

> Hi Denis,
>
> I saw the following error message in the logs on one of my ignite nodes
> (ml-ignite1.mobilink.osa):
>
> 02/01/2018 10:39:12  WARN
> [tcp-disco-msg-worker-#2%jazz-prod-triggers-anum%]
> TcpDiscoverySpi: Local node has detected failed nodes and started
> cluster-wide procedure. To speed up failure detection please see 'Failure
> Detection' section under javadoc for 'TcpDiscoverySpi'
> 02/01/2018 10:39:12  WARN
> [disco-event-worker-#105%jazz-prod-triggers-anum%]
> GridDiscoveryManager: Node FAILED: TcpDiscoveryNode
> [id=07826b34-d933-4400-ba3b-5270dc5df63d, addrs=[10.145.1.14, 127.0.0.1],
> sockAddrs=[/127.0.0.1:47500, ml-ignite2.mobilink.osa/10.145.1.14:47500],
> discPort=47500, order=1, intOrder=1, lastExchangeTime=1514888749012,
> loc=false, ver=2.1.0#20170720-sha1:a6ca5c8a, isClient=false]
> 02/01/2018 10:39:12  INFO
> [disco-event-worker-#105%jazz-prod-triggers-anum%]
> GridDiscoveryManager: Topology snapshot [ver=3, servers=1, clients=0,
> CPUs=56, heap=10.0GB]
>
> Looks like ignite2 node that was part of the cluster went offline. The
> error
> message on the ignite2 node was:
>
> 02/01/2018 11:03:20 ERROR [sys-stripe-18-#19%jazz-prod-triggers-anum%]
> GridEventStorageManager: Unexpected exception in listener notification for
> event: CacheEvent [cacheName=CACHE_L2_DPI, part=289,
> key=7a442a957218405ab9d04b8080a9f1e9, xid=null, lockId=GridCacheVersion
> [topVer=126371000, order=1514890998457, nodeOrder=1], newVal=L2_DPI
> [id=7a442a957218405ab9d04b8080a9f1e9, srcId=TR_DPI_G,
> hasParsingErrors=FALSE, AFK_MSISDN=923089045308, MSISDN=923089045308,
> IMSI=410018407922922, UPLINK_TRAFFIC=120, DOWNLINK_TRAFFIC=151,
> STARTTIMESECOND=1514832983, PROTOCOL_CATEGORY=4, APPLICATION=250,
> RAT_TYPE=1, IMEISV=357056089204360, TAC=35705608, CGI=, SAI=41001F0D3FF6C4,
> ECGI=, TAI=, entryTime=1514891000, isProcessed=FALSE], oldVal=null,
> hasOldVal=false, hasNewVal=true, near=false,
> subjId=d148804e-1cce-4c6e-900f-8c7ee017715d, cloClsName=null,
> taskName=null,
> nodeId8=d148804e, evtNodeId8=d148804e, msg=Cache event.,
> type=CACHE_OBJECT_PUT, tstamp=1514891000479]
> java.lang.ClassCastException:
> com.thinkbiganalytics.veon.trigger.modules.common.models.L2.L2_DPI cannot
> be
> cast to
>
> com.thinkbiganalytics.veon.trigger.modules.common.models.L1.L1_TriggerRecord
>         at
>
> com.thinkbiganalytics.veon.trigger.modules.core.controller.events.router.RouterLogic.process(RouterLogic.java:39)
>         at
>
> com.thinkbiganalytics.veon.trigger.modules.core.controller.events.listener.ListenerLocalCache$1.apply(ListenerLocalCache.java:33)
>         at
>
> com.thinkbiganalytics.veon.trigger.modules.core.controller.events.listener.ListenerLocalCache$1.apply(ListenerLocalCache.java:24)
>         at
>
> org.apache.ignite.internal.managers.eventstorage.GridEventStorageManager$UserListenerWrapper.onEvent(GridEventStorageManager.java:1438)
>         at
>
> org.apache.ignite.internal.managers.eventstorage.GridEventStorageManager.notifyListeners(GridEventStorageManager.java:895)
>         at
>
> org.apache.ignite.internal.managers.eventstorage.GridEventStorageManager.record0(GridEventStorageManager.java:344)
>         at
>
> org.apache.ignite.internal.managers.eventstorage.GridEventStorageManager.record(GridEventStorageManager.java:301)
>         at
>
> org.apache.ignite.internal.processors.cache.GridCacheEventManager.addEvent(GridCacheEventManager.java:327)
>         at
>
> org.apache.ignite.internal.processors.cache.GridCacheMapEntry.innerUpdate(GridCacheMapEntry.java:1794)
>         at
>
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateSingle(GridDhtAtomicCache.java:2437)
>         at
>
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal0(GridDhtAtomicCache.java:1847)
>         at
>
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$2900(GridDhtAtomicCache.java:129)
>         at
>
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$30.apply(GridDhtAtomicCache.java:1703)
>         at
>
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$30.apply(GridDhtAtomicCache.java:1689)
>         at
>
> org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:382)
>         at
>
> org.apache.ignite.internal.util.future.GridFutureAdapter.unblock(GridFutureAdapter.java:346)
>         at
>
> org.apache.ignite.internal.util.future.GridFutureAdapter.unblockAll(GridFutureAdapter.java:334)
>         at
>
> org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:494)
>         at
>
> org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:473)
>         at
>
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtForceKeysFuture.onDone(GridDhtForceKeysFuture.java:153)
>         at
>
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtForceKeysFuture.onDone(GridDhtForceKeysFuture.java:69)
>         at
>
> org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:450)
>         at
>
> org.apache.ignite.internal.util.future.GridCompoundFuture.checkComplete(GridCompoundFuture.java:278)
>         at
>
> org.apache.ignite.internal.util.future.GridCompoundFuture.apply(GridCompoundFuture.java:148)
>         at
>
> org.apache.ignite.internal.util.future.GridCompoundFuture.apply(GridCompoundFuture.java:45)
>         at
>
> org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:382)
>         at
>
> org.apache.ignite.internal.util.future.GridFutureAdapter.unblock(GridFutureAdapter.java:346)
>         at
>
> org.apache.ignite.internal.util.future.GridFutureAdapter.unblockAll(GridFutureAdapter.java:334)
>         at
>
> org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:494)
>         at
>
> org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:473)
>         at
>
> org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:450)
>
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: Ignite Cluster Error

Posted by Usman Waheed <us...@gmail.com>.
Hi Denis,

I saw the following error message in the logs on one of my ignite nodes
(ml-ignite1.mobilink.osa):

02/01/2018 10:39:12  WARN [tcp-disco-msg-worker-#2%jazz-prod-triggers-anum%]
TcpDiscoverySpi: Local node has detected failed nodes and started
cluster-wide procedure. To speed up failure detection please see 'Failure
Detection' section under javadoc for 'TcpDiscoverySpi'
02/01/2018 10:39:12  WARN [disco-event-worker-#105%jazz-prod-triggers-anum%]
GridDiscoveryManager: Node FAILED: TcpDiscoveryNode
[id=07826b34-d933-4400-ba3b-5270dc5df63d, addrs=[10.145.1.14, 127.0.0.1],
sockAddrs=[/127.0.0.1:47500, ml-ignite2.mobilink.osa/10.145.1.14:47500],
discPort=47500, order=1, intOrder=1, lastExchangeTime=1514888749012,
loc=false, ver=2.1.0#20170720-sha1:a6ca5c8a, isClient=false]
02/01/2018 10:39:12  INFO [disco-event-worker-#105%jazz-prod-triggers-anum%]
GridDiscoveryManager: Topology snapshot [ver=3, servers=1, clients=0,
CPUs=56, heap=10.0GB]

Looks like ignite2 node that was part of the cluster went offline. The error
message on the ignite2 node was:

02/01/2018 11:03:20 ERROR [sys-stripe-18-#19%jazz-prod-triggers-anum%]
GridEventStorageManager: Unexpected exception in listener notification for
event: CacheEvent [cacheName=CACHE_L2_DPI, part=289,
key=7a442a957218405ab9d04b8080a9f1e9, xid=null, lockId=GridCacheVersion
[topVer=126371000, order=1514890998457, nodeOrder=1], newVal=L2_DPI
[id=7a442a957218405ab9d04b8080a9f1e9, srcId=TR_DPI_G,
hasParsingErrors=FALSE, AFK_MSISDN=923089045308, MSISDN=923089045308,
IMSI=410018407922922, UPLINK_TRAFFIC=120, DOWNLINK_TRAFFIC=151,
STARTTIMESECOND=1514832983, PROTOCOL_CATEGORY=4, APPLICATION=250,
RAT_TYPE=1, IMEISV=357056089204360, TAC=35705608, CGI=, SAI=41001F0D3FF6C4,
ECGI=, TAI=, entryTime=1514891000, isProcessed=FALSE], oldVal=null,
hasOldVal=false, hasNewVal=true, near=false,
subjId=d148804e-1cce-4c6e-900f-8c7ee017715d, cloClsName=null, taskName=null,
nodeId8=d148804e, evtNodeId8=d148804e, msg=Cache event.,
type=CACHE_OBJECT_PUT, tstamp=1514891000479]
java.lang.ClassCastException:
com.thinkbiganalytics.veon.trigger.modules.common.models.L2.L2_DPI cannot be
cast to
com.thinkbiganalytics.veon.trigger.modules.common.models.L1.L1_TriggerRecord
        at
com.thinkbiganalytics.veon.trigger.modules.core.controller.events.router.RouterLogic.process(RouterLogic.java:39)
        at
com.thinkbiganalytics.veon.trigger.modules.core.controller.events.listener.ListenerLocalCache$1.apply(ListenerLocalCache.java:33)
        at
com.thinkbiganalytics.veon.trigger.modules.core.controller.events.listener.ListenerLocalCache$1.apply(ListenerLocalCache.java:24)
        at
org.apache.ignite.internal.managers.eventstorage.GridEventStorageManager$UserListenerWrapper.onEvent(GridEventStorageManager.java:1438)
        at
org.apache.ignite.internal.managers.eventstorage.GridEventStorageManager.notifyListeners(GridEventStorageManager.java:895)
        at
org.apache.ignite.internal.managers.eventstorage.GridEventStorageManager.record0(GridEventStorageManager.java:344)
        at
org.apache.ignite.internal.managers.eventstorage.GridEventStorageManager.record(GridEventStorageManager.java:301)
        at
org.apache.ignite.internal.processors.cache.GridCacheEventManager.addEvent(GridCacheEventManager.java:327)
        at
org.apache.ignite.internal.processors.cache.GridCacheMapEntry.innerUpdate(GridCacheMapEntry.java:1794)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateSingle(GridDhtAtomicCache.java:2437)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal0(GridDhtAtomicCache.java:1847)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$2900(GridDhtAtomicCache.java:129)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$30.apply(GridDhtAtomicCache.java:1703)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$30.apply(GridDhtAtomicCache.java:1689)
        at
org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:382)
        at
org.apache.ignite.internal.util.future.GridFutureAdapter.unblock(GridFutureAdapter.java:346)
        at
org.apache.ignite.internal.util.future.GridFutureAdapter.unblockAll(GridFutureAdapter.java:334)
        at
org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:494)
        at
org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:473)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtForceKeysFuture.onDone(GridDhtForceKeysFuture.java:153)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtForceKeysFuture.onDone(GridDhtForceKeysFuture.java:69)
        at
org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:450)
        at
org.apache.ignite.internal.util.future.GridCompoundFuture.checkComplete(GridCompoundFuture.java:278)
        at
org.apache.ignite.internal.util.future.GridCompoundFuture.apply(GridCompoundFuture.java:148)
        at
org.apache.ignite.internal.util.future.GridCompoundFuture.apply(GridCompoundFuture.java:45)
        at
org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:382)
        at
org.apache.ignite.internal.util.future.GridFutureAdapter.unblock(GridFutureAdapter.java:346)
        at
org.apache.ignite.internal.util.future.GridFutureAdapter.unblockAll(GridFutureAdapter.java:334)
        at
org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:494)
        at
org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:473)
        at
org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:450)





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Ignite Cluster Error

Posted by Denis Mekhanikov <dm...@gmail.com>.
Hi Usman!

"Timeout has occurred", printed by *GridTimeoutProcessor*, is an internal
debug message, which tells you, that some scheduled task is getting
executed. It's not an error.

And you shouldn't be worried about "connection refused" exceptions on the
node startup. They are thrown for all rejected addresses, that are
specified in IP finder. This is also a part of debug information.

So is there something that doesn't work on topology with 2 nodes? Except
for those messages, that you see in log?

Denis

вт, 2 янв. 2018 г. в 8:37, Usman Waheed <us...@gmail.com>:

> For the connection refused error in the logs we have set ipFinder property
> to:
>
> <property name="discoverySpi">
>             <bean
> class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
>                 <property name="socketTimeout" value="600000"/>
>                 <property name="networkTimeout" value="600000"/>
>                 <property name="joinTimeout" value="600000" />
>                 <property name="ackTimeout" value="50000" />
>                 <property name="statisticsPrintFrequency" value="20000" />
>
>                 <property name="ipFinder">
>                 <bean
>
> class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">
>                        <property name="addresses">
>                             <list>
>                 <value>10.145.1.14:47500..47509</value>
>                 <value>10.145.1.15:47500..47509</value>
>                             </list>
>                         </property>
>                     </bean>
>                 </property>
>             </bean>
>         </property>
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: Ignite Cluster Error

Posted by Usman Waheed <us...@gmail.com>.
For the connection refused error in the logs we have set ipFinder property
to:

<property name="discoverySpi">
            <bean
class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
                <property name="socketTimeout" value="600000"/>
                <property name="networkTimeout" value="600000"/>
                <property name="joinTimeout" value="600000" />
                <property name="ackTimeout" value="50000" />
                <property name="statisticsPrintFrequency" value="20000" />

                <property name="ipFinder">
                <bean
class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">
                       <property name="addresses">
                            <list>
                <value>10.145.1.14:47500..47509</value>
                <value>10.145.1.15:47500..47509</value>
                            </list>
                        </property>
                    </bean>
                </property>
            </bean>
        </property>




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/