You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@ignite.apache.org by Biren <bi...@servicenow.com> on 2017/08/20 00:34:39 UTC

Cluster segmentation

Hi,
I have embedded Ignite into my application and using it for distributed caches. I am running Ignite cluster in my lab environment. I have two nodes in the cluster. Time to time I get node segmented event and the node which receives it dies abruptly.

The application is registered to three discovery events node_left, node_failed and node_segmented. On receiving these events each node checks if it is now oldest node is the cluster. This is to check if oldest node has left/failed.

I am also listening to life cycle events. I am interested in before_node_stop and after_node_stop events. On receiving these events, I need to stop another component of the application.

1. What are the reasons of getting node_segmented event?
* One reason is obviously network glitch. Node losing connectivity with other members
* Can high memory usage/long GC pause be reason for segmentation?
* Is there a way to get cause of the segmentation?
2. After getting node_segmented event, I immediately got before_node_stop event. But after_node_stop did not follow. So node was kind of left in some inconsistent state. Never recovered from that.
* Is it possible that on receiving node_segmented event, when I tried to get the oldest node in the cluster caused the node to stop?
Event timeline from both nodes:

Application 1:
08/19/17 10:40:10 : received node failed event. The event was caused by application 2.
08/19/17 10:40:10 : [10:40:10] Topology snapshot [ver=3, servers=1, clients=0, CPUs=32, heap=14.0GB]

Application 2:
08/19/17 10:40:28 : received node segmented event. The event was caused by application 2
08/19/17 10:40:28 : Checking if oldest has changed
08/19/17 10:40:28 : Ignite Lifecycle event received: BEFORE_NODE_STOP. Fires event to stop another component
08/19/17 10:40:28 : dependent component stops
08/19/17 10:40:28 : received node failed event. The event was caused by application 1.
08/19/17 10:40:10 : Topology snapshot [ver=3, servers=1, clients=0, CPUs=32, heap=14.0GB]

Thanks,
Biren

--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Cluster-segmentation-tp16314.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Cluster segmentation

Posted by luqmanahmad <lu...@gmail.com>.

See [1] for free network segmentation plugin

[1]  https://github.com/luqmanahmad/ignite-plugins
<https://github.com/luqmanahmad/ignite-plugins>  



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Cluster segmentation

Posted by Biren Shah <Bi...@servicenow.com>.

Hi Val,

Did you get a chance to look at the code snippet I shared?

If I understand correctly then when I do get() on cache, it creates a copy of the value and return that copy. Do you think turning off that behavior will help?

Thanks,
Biren

On 8/24/17, 2:16 PM, "vkulichenko" <va...@gmail.com> wrote:

    Biren,
    
    Can you show the code of the receiver?
    
    -Val
    
    
    
    --
    View this message in context: https://urldefense.proofpoint.com/v2/url?u=http-3A__apache-2Dignite-2Dusers.70518.x6.nabble.com_Cluster-2Dsegmentation-2Dtp16314p16411.html&d=DwICAg&c=Zok6nrOF6Fe0JtVEqKh3FEeUbToa1PtNBZf6G01cvEQ&r=rbkF1xy5tYmkV8VMdTRVaIVhaXCNGxmyTB5plfGtWuY&m=uTFJ0dsOfKebPVHeYtxynWyF05QZ1L_VwKl88GOCfhs&s=qpsio0YIs_DqiTGNkLSMR-z76AFBwbNv-LvhPjwQOy8&e=
    Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Cluster segmentation

Posted by Biren Shah <Bi...@servicenow.com>.

Here is a rough structure of a cache. IgniteBaseCache is a wrapper on top of IgniteCache. It initializes the cache and a streamer for the cache.  

public class NormalizedDataCache extends IgniteBaseCache<RawPoint, RawPoint> {

	public NormalizedPointCache() {
		super(“cache_name”);
	}

	@Override
	protected CacheConfiguration<RawPoint, RawPoint> getCacheConfiguration() {
		CacheConfiguration<RawPoint, RawPoint> normalizedPointsCfg = new CacheConfiguration<RawPoint, RawPoint>();
		normalizedPointsCfg.setOnheapCacheEnabled(true);
		return normalizedPointsCfg;
	}

	@Override
	protected void setStreamerProperties() {
		fStreamer.autoFlushFrequency(1000);
		fStreamer.perNodeParallelOperations(8);
		fStreamer.perNodeBufferSize(102400);
	}

	public void addData(RawPoint Point) {
		// Identifier is the partition key, its been decorated with @AffinityKeyMapped in the class
		Point.setIdentifier(fNormalizerUtil.getIdentifier(Point));
		addToStream(Point, Point);
	}

	@Override
	protected StreamReceiver<RawPoint, RawPoint> getDataStreamerReceiver() {
		// normalize the raw data via DataStreamer's transform functionality.
		return StreamTransformer.from((e, arg) -> {
			new NormalizerAdapter().process((RawPoint) arg[0]);
			// Transformers are supposed to update the data and then write it to the cache. But we are using this cache
			// to distribute data, so we are not writing the data to cache
				return null;
			});
	}
}

NormalizerAdapter is another wrapper for internal class. It is the first stage of the processing. This internal class uses other distributed caches and creates a different object. That object gets added to yet another cache “B” via streamer. That is second stage of the processing. The other cache “B” is similar to this onw. The second cache “B” has similar receiver function. Which updates the object and writes it to application’s internal structure. These two caches are used to distribute the data based on affinity key. We are not storeing the data in these two caches.

After your suggestion yesterday, I updated addData method is this snippet. Previously I was creating a key with some properties of RawPoint. Now I have added the affinity key to RawPoint. Reducing the number of objects I was creating.

Thanks,
Biren

On 8/24/17, 2:16 PM, "vkulichenko" <va...@gmail.com> wrote:

    Biren,
    
    Can you show the code of the receiver?
    
    -Val
    
    
    
    --
    View this message in context: https://urldefense.proofpoint.com/v2/url?u=http-3A__apache-2Dignite-2Dusers.70518.x6.nabble.com_Cluster-2Dsegmentation-2Dtp16314p16411.html&d=DwICAg&c=Zok6nrOF6Fe0JtVEqKh3FEeUbToa1PtNBZf6G01cvEQ&r=rbkF1xy5tYmkV8VMdTRVaIVhaXCNGxmyTB5plfGtWuY&m=uTFJ0dsOfKebPVHeYtxynWyF05QZ1L_VwKl88GOCfhs&s=qpsio0YIs_DqiTGNkLSMR-z76AFBwbNv-LvhPjwQOy8&e=
    Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Cluster segmentation

Posted by vkulichenko <va...@gmail.com>.

Biren,

Can you show the code of the receiver?

-Val



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Cluster-segmentation-tp16314p16411.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Cluster segmentation

Posted by Biren Shah <Bi...@servicenow.com>.

Hi Val,

We do create lots of object. I am getting 1M data points event minute. For data affinity, I create a key object for each data point. As I mentioned earlier I have two stages of processing. We create new key on both stages. So, creating close to 2M new keys every minute. I have changed that and running the test now. 

Also If I understand correctly then when I do get() on cache, it creates a copy of the object and return that copy. Do you think turning off that behavior will help?

Thanks,
Biren

On 8/23/17, 5:51 PM, "vkulichenko" <va...@gmail.com> wrote:

    Biren,
    
    I see the jump and I actually see GC pauses as well (the longest one is the
    last line in log_2.txt). BTW, I don't think there is an quick jump, GC pause
    most likely blocks the monitor thread as well, so it just looks like a jump.
    Apparently, all these 30 seconds were spent in GC, and I'm pretty sure this
    is causing the issue.
    
    It looks like you're doing something that generates too many objects. My
    suggestion would be to use JFR [1] to profile object allocations and check
    what's going on.
    
    [1]
    https://urldefense.proofpoint.com/v2/url?u=https-3A__apacheignite.readme.io_docs_jvm-2Dand-2Dsystem-2Dtuning-23section-2Dflightrecorder-2Dsettings&d=DwICAg&c=Zok6nrOF6Fe0JtVEqKh3FEeUbToa1PtNBZf6G01cvEQ&r=rbkF1xy5tYmkV8VMdTRVaIVhaXCNGxmyTB5plfGtWuY&m=LiHHncH7191OkF4vZnSdvf7qmC9q13uRiNImGL2Grwk&s=Vr1gZsnAesfDJTrXQDi-tphsPD3lQ1NL8q4Q7l1y70E&e=
    
    It is allowed to use cache API from receiver. To remove entry using
    streamer, you can use removeData() method.
    
    -Val
    
    
    
    --
    View this message in context: https://urldefense.proofpoint.com/v2/url?u=http-3A__apache-2Dignite-2Dusers.70518.x6.nabble.com_Cluster-2Dsegmentation-2Dtp16314p16388.html&d=DwICAg&c=Zok6nrOF6Fe0JtVEqKh3FEeUbToa1PtNBZf6G01cvEQ&r=rbkF1xy5tYmkV8VMdTRVaIVhaXCNGxmyTB5plfGtWuY&m=LiHHncH7191OkF4vZnSdvf7qmC9q13uRiNImGL2Grwk&s=bD9L_fEZ1yLGCO-MpSUcP8icix7bTj0cFrfwyI6_-4k&e=
    Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Cluster segmentation

Posted by vkulichenko <va...@gmail.com>.

Biren,

I see the jump and I actually see GC pauses as well (the longest one is the
last line in log_2.txt). BTW, I don't think there is an quick jump, GC pause
most likely blocks the monitor thread as well, so it just looks like a jump.
Apparently, all these 30 seconds were spent in GC, and I'm pretty sure this
is causing the issue.

It looks like you're doing something that generates too many objects. My
suggestion would be to use JFR [1] to profile object allocations and check
what's going on.

[1]
https://apacheignite.readme.io/docs/jvm-and-system-tuning#section-flightrecorder-settings

It is allowed to use cache API from receiver. To remove entry using
streamer, you can use removeData() method.

-Val



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Cluster-segmentation-tp16314p16388.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Cluster segmentation

Posted by Biren Shah <Bi...@servicenow.com>.

Hi Val,

I took thread and heap dump on the node which got segmented. I don’t see any long GC pause. I have attached couple of log files and thread dump. Would you mind taking a look at it?

Look at log entries around 8/22/17 14:27 in both log files. In log one you can see that with in 30 seconds after receiving segmented event, memory usage jump from 4G to 9G. 

When I receive data, I am using data streamer to add data to cache. I do some processes of data via data streamer transformer. In transformer function I access other ignite caches. When I receive BEFORE_NODE_STOP event, I stop application to receive any new data. But I might have data in streamer which are not flushed/deleted yet. Caches are not available when node is shutting down, so transformer function kind of hangs. What is the 

In thread dump, there are few threads looks stuck. Is it safe to access other caches in data streamer StreamTransformer function? What is the way to clear/delete data from streamer?

Thanks,
Biren


On 8/22/17, 4:02 PM, "vkulichenko" <va...@gmail.com> wrote:

    Biren,
    
    Segmented node considers other nodes to be failed. I don't think it's an
    issue that you got node_failed event.
    
    As for increased memory consumption, I don't know what caused it. I would
    recommend you to take a heap dump and investigate.
    
    -Val
    
    
    
    --
    View this message in context: https://urldefense.proofpoint.com/v2/url?u=http-3A__apache-2Dignite-2Dusers.70518.x6.nabble.com_Cluster-2Dsegmentation-2Dtp16314p16373.html&d=DwICAg&c=Zok6nrOF6Fe0JtVEqKh3FEeUbToa1PtNBZf6G01cvEQ&r=rbkF1xy5tYmkV8VMdTRVaIVhaXCNGxmyTB5plfGtWuY&m=TUwxzbMUnjz7qDzhWGyOTsBir9okJD5quGpP2sXjjo4&s=OApOyf9J-qBccl0J-Pn5t5Gt50hSglUB0BbjYRW89us&e=
    Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Cluster segmentation

Posted by vkulichenko <va...@gmail.com>.

Biren,

Segmented node considers other nodes to be failed. I don't think it's an
issue that you got node_failed event.

As for increased memory consumption, I don't know what caused it. I would
recommend you to take a heap dump and investigate.

-Val



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Cluster-segmentation-tp16314p16373.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Cluster segmentation

Posted by Biren Shah <Bi...@servicenow.com>.

Hi Val,

Thanks for explaining the difference between events. After some reading I kind of figured that I don’t need to check anything once I get segmented event. You mentioned that node is the topology gets node fail/left event, then why I get node left event on segmented node? Look at the event timeline I have shared in the first post.

The application processes huge amount of time series data. we get the data at set interval. The data processing has two stages. Once we get data, I distribute them for first stage processing based on a key. After the first stage, we redistribute the data for second stage based on different key. For both stages, we have bunch of other metadata which are also in the distributed caches. Some of them are small so I have replicated them on all nodes. Some of them are huge which are partitioned. These metadata do not change a lot.

The other issue I faced was that once one of the node gets segmented, the other node dies. The reason it dies is because heap usage jumped on the other node instantly. It jumped from 4GM to 11 GB. Very strange. Any idea what could cause this?

Thanks,
Biren

On 8/21/17, 6:53 PM, "vkulichenko" <va...@gmail.com> wrote:

Hi Biren,

What is the use case and what are you trying to achieve by all this?

First of all, there is a difference between node_left/failed and
node_segmented events. The former is fired on nodes that are still in
topology to notify that one of the nodes left or failed. But the latter
means that *local* node got segmented, and I don't think it makes sense to
do any checks there.

Segmentation can happen for various reasons, but in vast majority of cases
it's a long GC pause. In this case node does not close connections, but
becomes unresponsive, which causes the cluster to remove it from topology
after failure detection timeout. When GC pause finishes, node tries to
continue to operate, but realizes that it was already kicked out. It then
fires node_segmented event locally and stops immediately. This is correct
behavior.

-Val

--
View this message in context: https://urldefense.proofpoint.com/v2/url?u=http-3A__apache-2Dignite-2Dusers.70518.x6.nabble.com_Cluster-2Dsegmentation-2Dtp16314p16351.html&d=DwICAg&c=Zok6nrOF6Fe0JtVEqKh3FEeUbToa1PtNBZf6G01cvEQ&r=rbkF1xy5tYmkV8VMdTRVaIVhaXCNGxmyTB5plfGtWuY&m=EsW9z3oSwgxZCeY4wMYDAG1DZSs6PrI_95QZkz5nrMk&s=137pm5de4sgSFexWBiXVHz-5keGP5OKYr-q74AlW5To&e=
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Cluster segmentation

Posted by vkulichenko <va...@gmail.com>.

Hi Biren,

What is the use case and what are you trying to achieve by all this?

First of all, there is a difference between node_left/failed and
node_segmented events. The former is fired on nodes that are still in
topology to notify that one of the nodes left or failed. But the latter
means that *local* node got segmented, and I don't think it makes sense to
do any checks there.

Segmentation can happen for various reasons, but in vast majority of cases
it's a long GC pause. In this case node does not close connections, but
becomes unresponsive, which causes the cluster to remove it from topology
after failure detection timeout. When GC pause finishes, node tries to
continue to operate, but realizes that it was already kicked out. It then
fires node_segmented event locally and stops immediately. This is correct
behavior.

-Val



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Cluster-segmentation-tp16314p16351.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.