You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@ignite.apache.org by Gianluca Bonetti <gi...@gmail.com> on 2018/10/02 08:31:12 UTC

Troubles in restarting Ignite with persistence enabled

Hello everyone

This is my first question to the mailing list, which I follow since some
time, to get hints about using Ignite.
Until now I used in other softwares development, and Ignite always rocked
and made the difference, hence I literally love it :)

Now I am facing troubles in restarting an Apache Ignite instance on a new
product we are developing and testing.
Previously, I have been developing using Apache Ignite with custom loader
from database, but this time we wanted to go with a "cache centric"
approach and use only Ignite Persistence, as there is no need of
integrating with databases or JDBC tools.
So Ignite Instance is the main and only storage.

The software is a monitoring platform, which receives small chunks of data
(more or less 500 bytes) and stores in different caches, depending on the
source address.
The number of incoming data packets is really low as we are only in
testing, let's say around 100 packes per minute.
The software is running in testing enviroment, so only one server is
deployed at the moment.

The software can run for weeks with no problem, the caches get bigger and
bigger and everything runs fine and fast.
Then if we restart the software, it takes ages to restart, and actually
most of the times it does not ever complete the initial restart of Ignite.
So we have to delete the persistence storage files, to be able to start
again.
As we are only in testing, we can still withstand it.

We get just a message in the logs: "Ignite node stopped in the middle of
checkpoint. Will restore memory state and finish checkpoint on node start."
The client instances connecting to Ignite gets the log: "
org.apache.ignite.logger.java.JavaLogger.info Join cluster while cluster
state transition is in progress, waiting when transition finish."
But it never finishes.

Speaking of sizes, when running tests with no interruption, the cache grew
up to 50 GBs, with no degradation in performance or data loss.
The issues with restarting start just when the cache grows up to ~4 GBs.
The other softwares I developed using Ignite, with custom database loader,
never had problems with large caches in memory.

The testing server is a dedicated Linux machine with 8 cores Xeon
processor, 64 GB RAM, and SATA disks on software mdraid.
The JVM is OpenJDK 8, started with "-server -Xms24g -Xmx24g
-XX:MaxMetaspaceSize=1g -XX:+AlwaysPreTouch -XX:+UseG1GC
-XX:+ScavengeBeforeFullGC -XX:+DisableExplicitGC -XX:+AggressiveOpts"

For starting Ignite instance, I am one (the last?) which prefers Java code
instead of XML files.
I recently switched off PeerClassLoading and added the
BinaryTypeConfiguration, which previosly I hadn't specified, but didn't
help.

public static final Ignite newInstance(List<String> remotes) {
DataStorageConfiguration storage = new DataStorageConfiguration();
DataRegionConfiguration region =
storage.getDefaultDataRegionConfiguration();
BinaryConfiguration binary = new BinaryConfiguration();
TcpDiscoveryVmIpFinder finder = new TcpDiscoveryVmIpFinder();
TcpDiscoverySpi discovery = new TcpDiscoverySpi();
IgniteConfiguration config = new IgniteConfiguration();
storage.setStoragePath("/home/ignite/data");
storage.setWalPath("/home/ignite/wal");
storage.setWalArchivePath("/home/ignite/archive");
region.setPersistenceEnabled(true);
region.setInitialSize(16L * 1024 * 1024 * 1024);
region.setMaxSize(16L * 1024 * 1024 * 1024);
binary.setCompactFooter(false);
binary.setTypeConfigurations(Arrays.asList(new
BinaryTypeConfiguration(Datum.class.getCanonicalName())));
finder.setAddresses(remotes);
discovery.setIpFinder(finder);
config.setDataStorageConfiguration(storage);
config.setBinaryConfiguration(binary);
config.setPeerClassLoadingEnabled(false);
config.setDiscoverySpi(discovery);
config.setClientMode(false);
Ignite ignite = Ignition.start(config);
ignite.cluster().active(true);
return ignite;
}

Datum is a small POJO class, with nearly 100 fields and should be less than
500 bytes of data.
Then there are nearly 200 caches in use, all containing Datum objects (at
least for now).

I am quite sure I am missing something when starting the instance, but
cannot understand what.

Is there a way to inspect the progress of the checkpoint at startup?
I cannot do anything by Ignite Visor as it would not connect until the
cluster activation finishes.

If you have any suggestions, let me know.

Thank you very much!
Best regards
Gianluca

Re: Troubles in restarting Ignite with persistence enabled

Posted by Gianluca Bonetti <gi...@gmail.com>.

Hello everyone

I have made some tests following Hamed Zahedifar suggestion about the
-XX:+AlwaysPreTouch and pointing to the RedHat thread.
By now, I simply run the tests cutting off -XX:+AlwaysPreTouch from JVM
startup command.
It starts back exaggeratedly faster than before.
With the same amount of data, around 4 GB, it starts in less than 15
seconds, while with -XX:+AlwaysPreTouch flag, it won't finish startup even
in 20 minutes.
By now I feel completely satisfied with the performance.
Thanks everyone for the support, and expecially Hamed for pointing out
about the flag behaviour.

Cheers
Gianluca

Il giorno mar 2 ott 2018 alle ore 10:31 Gianluca Bonetti <
gianluca.bonetti@gmail.com> ha scritto:

> Hello everyone
>
> This is my first question to the mailing list, which I follow since some
> time, to get hints about using Ignite.
> Until now I used in other softwares development, and Ignite always rocked
> and made the difference, hence I literally love it :)
>
> Now I am facing troubles in restarting an Apache Ignite instance on a new
> product we are developing and testing.
> Previously, I have been developing using Apache Ignite with custom loader
> from database, but this time we wanted to go with a "cache centric"
> approach and use only Ignite Persistence, as there is no need of
> integrating with databases or JDBC tools.
> So Ignite Instance is the main and only storage.
>
> The software is a monitoring platform, which receives small chunks of data
> (more or less 500 bytes) and stores in different caches, depending on the
> source address.
> The number of incoming data packets is really low as we are only in
> testing, let's say around 100 packes per minute.
> The software is running in testing enviroment, so only one server is
> deployed at the moment.
>
> The software can run for weeks with no problem, the caches get bigger and
> bigger and everything runs fine and fast.
> Then if we restart the software, it takes ages to restart, and actually
> most of the times it does not ever complete the initial restart of Ignite.
> So we have to delete the persistence storage files, to be able to start
> again.
> As we are only in testing, we can still withstand it.
>
> We get just a message in the logs: "Ignite node stopped in the middle of
> checkpoint. Will restore memory state and finish checkpoint on node start."
> The client instances connecting to Ignite gets the log: "
> org.apache.ignite.logger.java.JavaLogger.info Join cluster while cluster
> state transition is in progress, waiting when transition finish."
> But it never finishes.
>
> Speaking of sizes, when running tests with no interruption, the cache grew
> up to 50 GBs, with no degradation in performance or data loss.
> The issues with restarting start just when the cache grows up to ~4 GBs.
> The other softwares I developed using Ignite, with custom database loader,
> never had problems with large caches in memory.
>
> The testing server is a dedicated Linux machine with 8 cores Xeon
> processor, 64 GB RAM, and SATA disks on software mdraid.
> The JVM is OpenJDK 8, started with "-server -Xms24g -Xmx24g
> -XX:MaxMetaspaceSize=1g -XX:+AlwaysPreTouch -XX:+UseG1GC
> -XX:+ScavengeBeforeFullGC -XX:+DisableExplicitGC -XX:+AggressiveOpts"
>
> For starting Ignite instance, I am one (the last?) which prefers Java code
> instead of XML files.
> I recently switched off PeerClassLoading and added the
> BinaryTypeConfiguration, which previosly I hadn't specified, but didn't
> help.
>
> public static final Ignite newInstance(List<String> remotes) {
> DataStorageConfiguration storage = new DataStorageConfiguration();
> DataRegionConfiguration region =
> storage.getDefaultDataRegionConfiguration();
> BinaryConfiguration binary = new BinaryConfiguration();
> TcpDiscoveryVmIpFinder finder = new TcpDiscoveryVmIpFinder();
> TcpDiscoverySpi discovery = new TcpDiscoverySpi();
> IgniteConfiguration config = new IgniteConfiguration();
> storage.setStoragePath("/home/ignite/data");
> storage.setWalPath("/home/ignite/wal");
> storage.setWalArchivePath("/home/ignite/archive");
> region.setPersistenceEnabled(true);
> region.setInitialSize(16L * 1024 * 1024 * 1024);
> region.setMaxSize(16L * 1024 * 1024 * 1024);
> binary.setCompactFooter(false);
> binary.setTypeConfigurations(Arrays.asList(new
> BinaryTypeConfiguration(Datum.class.getCanonicalName())));
> finder.setAddresses(remotes);
> discovery.setIpFinder(finder);
> config.setDataStorageConfiguration(storage);
> config.setBinaryConfiguration(binary);
> config.setPeerClassLoadingEnabled(false);
> config.setDiscoverySpi(discovery);
> config.setClientMode(false);
> Ignite ignite = Ignition.start(config);
> ignite.cluster().active(true);
> return ignite;
> }
>
> Datum is a small POJO class, with nearly 100 fields and should be less
> than 500 bytes of data.
> Then there are nearly 200 caches in use, all containing Datum objects (at
> least for now).
>
> I am quite sure I am missing something when starting the instance, but
> cannot understand what.
>
> Is there a way to inspect the progress of the checkpoint at startup?
> I cannot do anything by Ignite Visor as it would not connect until the
> cluster activation finishes.
>
> If you have any suggestions, let me know.
>
> Thank you very much!
> Best regards
> Gianluca
>

Re: Troubles in restarting Ignite with persistence enabled

Posted by akurbanov <an...@gmail.com>.

Hello Gianluca,

There might be a several reasons for this slowdown. Could you please provide
a couple of consecutive thread dumps when node startup hangs and a node log?
If it is possible, could you do a restart with -DIGNITE_QUIET=false -ea JVM
flags?

Regards,
Anton



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/