You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by userx <ga...@gmail.com> on 2017/08/21 18:15:30 UTC

ignite.active(true) blocking forever

Hi all,

*QUESTION:-Is there a way to timeOut the ignite.active(true) so that the
client does not block forever and moves ahead for some other important
processing ?*

*OBSERVATIONS:-*

I have a client(-Xms512m -Xmx512m) which is represented by the following
piece of code
package org.apache.ignite.examples;
import org.apache.ignite.Ignite;
import org.apache.ignite.IgniteCache;
import org.apache.ignite.Ignition;
import org.apache.ignite.configuration.CacheConfiguration;

public class DataGridClient {

	public static void main(String[] args) {
		// TODO Auto-generated method stub
		Ignition.setClientMode(true);
		Ignite ignite = Ignition.start("examples/config/example-ignite.xml");
		System.out.println("Before Active");
		ignite.active(true);
		System.out.println("After Active");
		CacheConfiguration<Integer, Integer> cfg = new CacheConfiguration<Integer,
Integer>("1");
		IgniteCache<Integer,Integer> cache = ignite.getOrCreateCache(cfg);
		cache.put(1, 1);
	}

}

I have 2 servers (2 different jvms on same machine) *deliberately* started
with -Xms128m -Xmx128m with the following piece of code.
Ignite ignite = Ignition.start("config/example-ignite.xml");

Client and both Servers are started on the same machine.

What I have observed is that the text "After Active" never gets printed and
ignite.active(true) blocks forever. On the other hand, I could see that
there is an OOM error in the server logs
[23:34:47] Ignite node started OK (id=825a2757)
[23:34:47] Topology snapshot [ver=1, servers=1, clients=0, CPUs=4,
heap=0.13GB]
TcpDiscoveryNode [id=825a2757-71b6-4483-90d3-0a0b1d7b1923,
addrs=[0:0:0:0:0:0:0:1, 127.0.0.1, 192.168.1.3,
fd9c:b2b2:d704:2000:4cbb:b964:b3fb:9269,
fd9c:b2b2:d704:2000:68ac:5531:940e:b851],
sockAddrs=[Garima-PC/192.168.1.3:47501,
/fd9c:b2b2:d704:2000:4cbb:b964:b3fb:9269:47501,
/fd9c:b2b2:d704:2000:68ac:5531:940e:b851:47501, /0:0:0:0:0:0:0:1:47501,
/127.0.0.1:47501], discPort=47501, order=1, intOrder=1,
lastExchangeTime=1503338687228, loc=true, ver=2.1.0#20170720-sha1:a6ca5c8a,
isClient=false]
[23:34:47] Topology snapshot [ver=2, servers=2, clients=0, CPUs=4,
heap=0.25GB]
[23:35:41] Topology snapshot [ver=3, servers=2, clients=1, CPUs=4,
heap=0.75GB]
[23:35:41] Default checkpoint page buffer size is too small, setting to an
adjusted value: 519.5 MiB
[23:35:41,881][ERROR][exchange-worker-#34%null%][GridDhtPartitionsExchangeFuture]
Failed to reinitialize local partitions (preloading will be stopped):
GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=3,
minorTopVer=1], nodeId=825a2757, evt=DISCOVERY_CUSTOM_EVT]
java.lang.OutOfMemoryError: null
	at sun.misc.Unsafe.allocateMemory(Native Method) ~[?:1.8.0_111]
	at
org.apache.ignite.internal.util.GridUnsafe.allocateMemory(GridUnsafe.java:1054)
~[ignite-core-2.1.0.jar:2.1.0]
	at
org.apache.ignite.internal.mem.unsafe.UnsafeMemoryProvider.nextRegion(UnsafeMemoryProvider.java:80)
~[ignite-core-2.1.0.jar:2.1.0]
	at
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.start(PageMemoryImpl.java:276)
~[ignite-core-2.1.0.jar:2.1.0]
	at
org.apache.ignite.internal.processors.cache.persistence.IgniteCacheDatabaseSharedManager.startMemoryPolicies(IgniteCacheDatabaseSharedManager.java:194)
~[ignite-core-2.1.0.jar:2.1.0]
	at
org.apache.ignite.internal.processors.cache.persistence.IgniteCacheDatabaseSharedManager.onActivate(IgniteCacheDatabaseSharedManager.java:949)
~[ignite-core-2.1.0.jar:2.1.0]
	at
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.onActivate(GridCacheDatabaseSharedManager.java:459)
~[ignite-core-2.1.0.jar:2.1.0]
	at
org.apache.ignite.internal.processors.cache.GridCacheSharedContext.activate(GridCacheSharedContext.java:244)
~[ignite-core-2.1.0.jar:2.1.0]
	at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onClusterStateChangeRequest(GridDhtPartitionsExchangeFuture.java:762)
~[ignite-core-2.1.0.jar:2.1.0]
	at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:574)
[ignite-core-2.1.0.jar:2.1.0]
	at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:1901)
[ignite-core-2.1.0.jar:2.1.0]
	at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
[ignite-core-2.1.0.jar:2.1.0]
	at java.lang.Thread.run(Thread.java:745) [?:1.8.0_111]
[23:35:41,889][ERROR][exchange-worker-#34%null%][GridCachePartitionExchangeManager]
Runtime error caught during grid runnable execution: GridWorker
[name=partition-exchanger, igniteInstanceName=null, finished=false,
hashCode=724661, interrupted=false, runner=exchange-worker-#34%null%]




 



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/ignite-active-true-blocking-forever-tp16346.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: ignite.active(true) blocking forever

Posted by Yakov Zhdanov <yz...@apache.org>.
Then you may have a data loss?

--Yakov

2017-09-06 12:51 GMT+03:00 Вячеслав Коптилин <sl...@gmail.com>:

> Hi Yakov,
>
> At first sight, it seems that we can use the same approach for this issue,
> but in this particular case, I think, the best way to handle
> OutOfMemoryError is to shutdown the node that could not be activated.
>
> Best regards,
> Slava.
>
> 2017-09-06 11:26 GMT+03:00 Yakov Zhdanov <yz...@apache.org>:
>
>> Slava, this looks like the issue with start cache operation throwing
>> exception on some nodes and cache start rollback. I remember you
>> implemented that fix. Can we apply the same machinery to rollback the
>> activation with proper exception?
>>
>> --Yakov
>>
>
>

Re: ignite.active(true) blocking forever

Posted by Вячеслав Коптилин <sl...@gmail.com>.
Hi Yakov,

At first sight, it seems that we can use the same approach for this issue,
but in this particular case, I think, the best way to handle
OutOfMemoryError is to shutdown the node that could not be activated.

Best regards,
Slava.

2017-09-06 11:26 GMT+03:00 Yakov Zhdanov <yz...@apache.org>:

> Slava, this looks like the issue with start cache operation throwing
> exception on some nodes and cache start rollback. I remember you
> implemented that fix. Can we apply the same machinery to rollback the
> activation with proper exception?
>
> --Yakov
>

Re: ignite.active(true) blocking forever

Posted by Yakov Zhdanov <yz...@apache.org>.
Slava, this looks like the issue with start cache operation throwing
exception on some nodes and cache start rollback. I remember you
implemented that fix. Can we apply the same machinery to rollback the
activation with proper exception?

--Yakov

Re: ignite.active(true) blocking forever

Posted by "slava.koptilin" <sl...@gmail.com>.
Hi,

I am sorry for the delay.

I was able to reproduce this issue, and it looks like a bug.
I created a jira ticket in order to track this
https://issues.apache.org/jira/browse/IGNITE-6274

Thanks,
Slava.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: ignite.active(true) blocking forever

Posted by userx <ga...@gmail.com>.
Hi Slava,

Yes but in that case, at least the default jvm options should definitely
work.

-Xms128m -Xmx128m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=D:\GC
-XX:OnOutOfMemoryError="kill -9 %p"

The process is not getting killed and neither the heap dump is getting
created. I have attached a visual vm snapshot as well. Can we please look
into this ?

application-1503730020705.apps
<http://apache-ignite-users.70518.x6.nabble.com/file/n16425/application-1503730020705.apps>  

Here is the log

Exception in thread "exchange-worker-#34%null%" java.lang.OutOfMemoryError
	at sun.misc.Unsafe.allocateMemory(Native Method)
	at
org.apache.ignite.internal.util.GridUnsafe.allocateMemory(GridUnsafe.java:1054)
	at
org.apache.ignite.internal.mem.unsafe.UnsafeMemoryProvider.nextRegion(UnsafeMemoryProvider.java:80)
	at
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.start(PageMemoryImpl.java:276)
	at
org.apache.ignite.internal.processors.cache.persistence.IgniteCacheDatabaseSharedManager.startMemoryPolicies(IgniteCacheDatabaseSharedManager.java:194)
	at
org.apache.ignite.internal.processors.cache.persistence.IgniteCacheDatabaseSharedManager.onActivate(IgniteCacheDatabaseSharedManager.java:949)
	at
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.onActivate(GridCacheDatabaseSharedManager.java:459)
	at
org.apache.ignite.internal.processors.cache.GridCacheSharedContext.activate(GridCacheSharedContext.java:244)
	at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onClusterStateChangeRequest(GridDhtPartitionsExchangeFuture.java:762)
	at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:574)
	at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:1901)
	at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
	at java.lang.Thread.run(Thread.java:745)
[12:08:59,100][SEVERE][query-#37%null%][msg] Received message without
registered handler (will ignore) [msg=GridCacheQueryRequest [id=3,
cacheName=ignite-sys-cache, type=SCAN, fields=false, clause=null,
clsName=null, keyValFilter=null, rdc=null, trans=null, pageSize=1024,
incBackups=false, cancel=false, incMeta=false, all=false, keepBinary=false,
subjId=c713b24d-ea10-4f96-8835-9da202ea8aa6, taskHash=0, part=-1,
topVer=AffinityTopologyVersion [topVer=2, minorTopVer=1],
super=GridCacheIdMessage [cacheId=-2100569601]],
node=c713b24d-ea10-4f96-8835-9da202ea8aa6, locTopVer=AffinityTopologyVersion
[topVer=-1, minorTopVer=0], msgTopVer=AffinityTopologyVersion [topVer=2,
minorTopVer=1], desc=DynamicCacheDescriptor
[deploymentId=0dd934d1e51-51535f95-889d-4fb9-adce-93165ed946d4,
staticCfg=true, sql=false, cacheType=UTILITY, template=false,
updatesAllowed=true, cacheId=-2100569601,
rcvdFrom=77e87467-40a6-40b0-8b94-e8a7805157b9, objCtx=null,
rcvdOnDiscovery=false, startTopVer=AffinityTopologyVersion [topVer=2,
minorTopVer=0], rcvdFromVer=AffinityTopologyVersion [topVer=1,
minorTopVer=0], clientCacheStartVer=null, schema=QuerySchema [],
grpDesc=CacheGroupDescriptor [grpId=-2100569601, grpName=null,
startTopVer=null, rcvdFrom=77e87467-40a6-40b0-8b94-e8a7805157b9,
deploymentId=0dd934d1e51-51535f95-889d-4fb9-adce-93165ed946d4,
caches={ignite-sys-cache=-2100569601}, rcvdFromVer=AffinityTopologyVersion
[topVer=1, minorTopVer=0], cacheName=ignite-sys-cache],
cacheName=ignite-sys-cache]]
Registered listeners:





--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/ignite-active-true-blocking-forever-tp16346p16425.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: ignite.active(true) blocking forever

Posted by "slava.koptilin" <sl...@gmail.com>.
Hi,

I don't think there is a way to properly recover application/server or any
other service once out of memory error arises.
Just trying to send a simple notification may lead to another attempt to
allocate memory and therefore new OOME must be thrown.
So, the best way to treat OOME is to treat it as unrecoverable error.

Thanks!



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/ignite-active-true-blocking-forever-tp16346p16406.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: ignite.active(true) blocking forever

Posted by userx <ga...@gmail.com>.
Thanks Slava, 

The recommended configuration worked for me. But I believe there is a much
more seminal question here. If there is an error situation (OOM) why doesn't
the server shutdown itself because had I not changed the config you
recommended, I am not sure if the data grid server is even useful because
ths subsequent activations will fail as well.

In order to detect such failure events, I wrote the following code but no
failure event is being caught

The server code I am using is in ExampleNodeStartup
public class ExampleNodeStartup {
    /**
     * Start up an empty node with example compute configuration.
     *
     * @param args Command line arguments, none required.
     * @throws IgniteException If failed.
     */
    public static void main(String[] args) throws IgniteException {
                Ignite ignite = Ignition.start("config/example-ignite.xml");
        
        IgnitePredicate<Event> failureEvents = e -> {
        	return false;
        };
        ignite.events().localListen(failureEvents, EventType.EVTS_ERROR);
        Event event = ignite.events().waitForLocal(failureEvents,
EventType.EVTS_ERROR);
        if(event != null){
        	System.out.println("Event:"+event.name());
        	ignite.close();
        	Ignition.stop(true);
        }
        System.out.println("After");
    }
}

Here is the log

[23:38:49,330][SEVERE][exchange-worker-#34%null%][GridDhtPartitionsExchangeFuture]
Failed to reinitialize local partitions (preloading will be stopped):
GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=3,
minorTopVer=1], nodeId=521eceaf, evt=DISCOVERY_CUSTOM_EVT]
java.lang.OutOfMemoryError
	at sun.misc.Unsafe.allocateMemory(Native Method)
	at
org.apache.ignite.internal.util.GridUnsafe.allocateMemory(GridUnsafe.java:1054)
	at
org.apache.ignite.internal.mem.unsafe.UnsafeMemoryProvider.nextRegion(UnsafeMemoryProvider.java:80)
	at
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.start(PageMemoryImpl.java:276)
	at
org.apache.ignite.internal.processors.cache.persistence.IgniteCacheDatabaseSharedManager.startMemoryPolicies(IgniteCacheDatabaseSharedManager.java:194)
	at
org.apache.ignite.internal.processors.cache.persistence.IgniteCacheDatabaseSharedManager.onActivate(IgniteCacheDatabaseSharedManager.java:949)
	at
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.onActivate(GridCacheDatabaseSharedManager.java:459)
	at
org.apache.ignite.internal.processors.cache.GridCacheSharedContext.activate(GridCacheSharedContext.java:244)
	at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onClusterStateChangeRequest(GridDhtPartitionsExchangeFuture.java:762)
	at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:574)
	at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:1901)
	at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
	at java.lang.Thread.run(Thread.java:745)
[23:38:49,332][SEVERE][exchange-worker-#34%null%][GridCachePartitionExchangeManager]
Runtime error caught during grid runnable execution: GridWorker
[name=partition-exchanger, igniteInstanceName=null, finished=false,
hashCode=8631591, interrupted=false, runner=exchange-worker-#34%null%]
java.lang.OutOfMemoryError


How can I ensure listening to an event that if it is a failure event, the
server instance is stopped and it is no more a part of topology ?



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/ignite-active-true-blocking-forever-tp16346p16383.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: ignite.active(true) blocking forever

Posted by "slava.koptilin" <sl...@gmail.com>.
Hi,

By default, each Ignite instance grabs 80% of available memory on startup,
that may be a cause the operating system to start swapping and slows
everything down.

As you mentioned, you are trying to start 2 JVMs on same machine,
so, it seems that you need to reduce memory size (it's a
defaultMemoryPolicySize property). 

[1]
https://apacheignite.readme.io/v2.1/docs/memory-configuration#memory-policies

Thanks!



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/ignite-active-true-blocking-forever-tp16346p16364.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.