You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by Chris Berry <ch...@gmail.com> on 2018/09/04 21:39:09 UTC

Determine Node Health?

Hi,

We are using an Ignite ComputeGrid, and it is mostly working nicely. 

Recently we had a Node with "Noisy Neighbors" in AWS that wrecked havoc in
our ComputeGrid.
Even though that Node was quite slow, it was never removed from the
map/reduce – slowing down all computes.

We have already built a system that allows us to add/subtract Nodes to the
ComputeGrid based on when they are actually “ready to compute”, 
Because our Nodes take considerable time to be truly ready for computation
(i.e. quite a bit of prepreparation is required).
So, to accomplish this, we use a dynamic Ignite ClusterGroup when we create
the compute.

```
ClusterGroup readyNodes =
readyForComputeMonitor.getNodesReadyForCompute(ignite.cluster());
log.debug(dumpClusterGroup(readyNodes));
return ignite.compute(readyNodes);
```

So. My question.
Does Ignite keep any information that we can use to determine if a Node is
healthy?
I.e. some way that we can locate any outliers in the ComputeGrid?

For example, the Node in our recent incident was at 100% CPU and was much,
much slower in the reduce phase.

Any help/advise would be much appreciated.

Thanks, 
-- Chris 





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

RE: Determine Node Health?

Posted by Stanislav Lukyanov <st...@gmail.com>.
You’re creating a new cache on each heath check call and never 
destroy them – of course, that leads to a memory leak; it’s also awful for the performance.

Don’t create a new cache each time. If you really want to check that cache operations work, 
use the same one every time.

Thanks,
Stan


From: Jason.G
Sent: 10 октября 2018 г. 8:49
To: user@ignite.apache.org
Subject: Re: Determine Node Health?

Hi vgrigorev,

I used your suggestion to do health check for each node. But I got memory
leak issue and exit with OOM error:  java heap space.

Below is my example code: 

// I create one bean to collect what I want info which include IP, hostname,
createtime and then return json string.
IgniteHealthCheckEntity healthCheck = new IgniteHealthCheckEntity();
ClusterNode node = ignite.cluster().localNode();
List<String> adresses = (List<String>)node.addresses();
String ip = adresses.get(0);
				
List<String> hostnames = (List<String>)node.hostNames();
String hostname = hostnames.get(0);
				
healthCheck.setServerIp(ip);
healthCheck.setStatus(0);
healthCheck.setServerHostname(hostname);
healthCheck.setMonitorTime(monitorTime);
healthCheck.setClientIp(clientIp);
String cacheName = "test_monitor_" + ipStr + "_"+ new Date().getTime();
				
IgniteCache<String, String> putCache = ignite.createCache(cacheName);
putCache.put("test", "test");
String value = putCache.get("test");
if(!"test".equals(value)) {
	message = "Ignite ("+ ip  +") " + "get/put value failed";
	healthCheck.setMessage(message);
	return JSONObject.fromObject(healthCheck).toString();
}else {
	message = "OKOKOK";
	healthCheck.setMessage(message);
	return JSONObject.fromObject(healthCheck).toString(); 
}





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Determine Node Health?

Posted by "Jason.G" <ig...@163.com>.
Hi vgrigorev,

I used your suggestion to do health check for each node. But I got memory
leak issue and exit with OOM error:  java heap space.

Below is my example code: 

// I create one bean to collect what I want info which include IP, hostname,
createtime and then return json string.
IgniteHealthCheckEntity healthCheck = new IgniteHealthCheckEntity();
ClusterNode node = ignite.cluster().localNode();
List<String> adresses = (List<String>)node.addresses();
String ip = adresses.get(0);
				
List<String> hostnames = (List<String>)node.hostNames();
String hostname = hostnames.get(0);
				
healthCheck.setServerIp(ip);
healthCheck.setStatus(0);
healthCheck.setServerHostname(hostname);
healthCheck.setMonitorTime(monitorTime);
healthCheck.setClientIp(clientIp);
String cacheName = "test_monitor_" + ipStr + "_"+ new Date().getTime();
				
IgniteCache<String, String> putCache = ignite.createCache(cacheName);
putCache.put("test", "test");
String value = putCache.get("test");
if(!"test".equals(value)) {
	message = "Ignite ("+ ip  +") " + "get/put value failed";
	healthCheck.setMessage(message);
	return JSONObject.fromObject(healthCheck).toString();
}else {
	message = "OKOKOK";
	healthCheck.setMessage(message);
	return JSONObject.fromObject(healthCheck).toString(); 
}





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Determine Node Health?

Posted by Chris Berry <ch...@gmail.com>.
Thanks to you both!




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Determine Node Health?

Posted by vgrigorev <vg...@mail.ru>.
I would propose to make periodic call to all nodes one by one
with some simple remote function.
Measure time or each node responce, and if it is low for some node according
to your needs, avoid using this node for some period. 

How to choose nodes for call, single or many:

        IgniteCompute compute = ignite.compute(ignite.cluster().forNodeIds( 
set UUID here ));
        final Collection<String> mapKexs = compute.broadcast(
                new IgniteCallable<String>() {
                    // Inject Ignite instance.
                    @IgniteInstanceResource
                    private Ignite ignite;

                    @Override
                    public String call() throws Exception {
                        log.debug(" DIAGNOSTICS: node is `{}`",
ignite.cluster().localNode().consistentId() , url);

                        return ignite.cluster().localNode().consistentId() ;
                    }
                });



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Determine Node Health?

Posted by Alex Plehanov <pl...@gmail.com>.
Hello Chris,

There is no such metric as "node is healthy" now, but each node provides a
lot of low-level metrics such as CPU usage, memory usage, jobs
execution/waiting time etc, which you can combine and define your own
criteria of "healthy node". These metrics available cluster-wide and
contains information for each node, see ClusterGroup#metrics(),
ClusterNode#metrics() methods.


ср, 5 сент. 2018 г. в 0:39, Chris Berry <ch...@gmail.com>:

> Hi,
>
> We are using an Ignite ComputeGrid, and it is mostly working nicely.
>
> Recently we had a Node with "Noisy Neighbors" in AWS that wrecked havoc in
> our ComputeGrid.
> Even though that Node was quite slow, it was never removed from the
> map/reduce – slowing down all computes.
>
> We have already built a system that allows us to add/subtract Nodes to the
> ComputeGrid based on when they are actually “ready to compute”,
> Because our Nodes take considerable time to be truly ready for computation
> (i.e. quite a bit of prepreparation is required).
> So, to accomplish this, we use a dynamic Ignite ClusterGroup when we create
> the compute.
>
> ```
> ClusterGroup readyNodes =
> readyForComputeMonitor.getNodesReadyForCompute(ignite.cluster());
> log.debug(dumpClusterGroup(readyNodes));
> return ignite.compute(readyNodes);
> ```
>
> So. My question.
> Does Ignite keep any information that we can use to determine if a Node is
> healthy?
> I.e. some way that we can locate any outliers in the ComputeGrid?
>
> For example, the Node in our recent incident was at 100% CPU and was much,
> much slower in the reduce phase.
>
> Any help/advise would be much appreciated.
>
> Thanks,
> -- Chris
>
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>