You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@ignite.apache.org by gweiske <gw...@eagleinvsys.com> on 2019/01/08 12:55:26 UTC

Ignite 2.7 Persistence

I am using Ignite 2.7 with persistence enabled on a single VM with 128 GB RAM
in Azure and separate external HDD drives each for wal, walarchive and
storage. I loaded 20 GB of data/50,000,000 rows, then shut down Ignite and
restarted the hosting VM, started and activated Ignite and ran a simple
query
that requires sorting through all the data (SELECT DISTINCT <column> FROM 
;). The query has been running for hours now. Looking at the memory, instead
of the expected ~42 GB it is currently at 5.7GB (*slowly* increasing). Any
ideas why it might be that slow? 
The same scenario with SSD drives (this time 1 drive for wal and walarchive,
a second one for storage) finishes in about 5500 seconds (still slow).



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Ignite 2.7 Persistence

Posted by Dmitry Lazurkin <di...@gmail.com>.

Some java code that helps me on node startup:

// Call for each partition in parallel
private void preloadPartition(int partition) {
        IgniteCache<String, BinaryObject> cache = ignite
                .cache("test_cache")
                .withKeepBinary();

        ScanQuery<String, BinaryObject> query = new
ScanQuery<>(partition, (k, v) -> {
            return false;
        });
        query.setLocal(true);

        try (QueryCursor<Cache.Entry<String, BinaryObject>> cursor =
cache.query(query)) {
            for (@SuppressWarnings("unused") Cache.Entry<String,
BinaryObject> row  : cursor) {
                // empty
            }
        }
    }

// Call for each index
private void preloadIndex(String index) {
    // Use sql query which uses index and contains falsy-condition
}

PS. My memory region is bigger than total data size.

On 1/11/19 18:20, gweiske wrote:
> Is there a command that one can/needs to run to load the data into memory
> after restart of Ignite? The documentation suggests that at least for 2.7
> that is not necessary, and I have not found a command that would start the
> loading into memory from persistence. It looks like one can write some Java
> code, but it seems such basic functionality that I thought that there should
> be a shell command.
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/

RE: Ignite 2.7 Persistence

Posted by gweiske <gw...@eagleinvsys.com>.

Is there a command that one can/needs to run to load the data into memory
after restart of Ignite? The documentation suggests that at least for 2.7
that is not necessary, and I have not found a command that would start the
loading into memory from persistence. It looks like one can write some Java
code, but it seems such basic functionality that I thought that there should
be a shell command.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

RE: Ignite 2.7 Persistence

Posted by Stanislav Lukyanov <st...@gmail.com>.

Running the query the first time isn’t really like loading all data into memory and then doing the query. I would assume that
it is much less efficient – all kinds of locking and contention may be involved. Also, the reads are done via random disk access, while when reading from
CSV you’re reading sequentially.

I assume that there are ways to make queries on a cold storage more efficient.
One would probably need to spend a lot of time on that collecting and analyzing JFRs and other profiling data.
On the other hand, having an ability to do a hot restart will probably solve the issue for most users.

Stan

From: gweiske
Sent: 11 января 2019 г. 2:03
To: user@ignite.apache.org
Subject: RE: Ignite 2.7 Persistence

Thanks for the replies. Yes, subsequent queries are faster, but the time to
run the query the first time (i.e. load the data into memory) after a
restart can be measured in hours and is significantly longer than loading
the data from a csv file. That does not seem right. 




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

RE: Ignite 2.7 Persistence

Posted by gweiske <gw...@eagleinvsys.com>.

Thanks for the replies. Yes, subsequent queries are faster, but the time to
run the query the first time (i.e. load the data into memory) after a
restart can be measured in hours and is significantly longer than loading
the data from a csv file. That does not seem right. 




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

RE: Ignite 2.7 Persistence

Posted by Stanislav Lukyanov <st...@gmail.com>.

Hi,

That’s right, Ignite nodes restart “cold” meaning that they become operational without the data in the RAM.
It allows to restart as quickly as possible, but the price is that the first operations have to load data from the disk, meaning that the performance will be much lower.

Here is a ticket to allow turn on a “hot restart” mode - https://issues.apache.org/jira/browse/IGNITE-10152.
There is also an improvement that allows to manually load data of a specific partition in an efficient way - https://issues.apache.org/jira/browse/IGNITE-8873. If you iterate over all partitions after the node start it may shorten the warmup period.

Stan 

From: Glenn Wiebe
Sent: 8 января 2019 г. 18:02
To: user@ignite.apache.org
Subject: Re: Ignite 2.7 Persistence

I am new to Ignite, but as I understand it, after cluster restart, data is re-hydrated into memory as the nodes receive requests for their partitions' entries. So, a first query would be as slow as a distributed disk-based query. Subsequent queries should have some (depending on memory available) information in memory and thus faster. 

So, my question, is this the first query execution since startup?
Given that you have sufficient memory to hold this particular cache, I would expect subsequent query executions to take advantage of memory resident query processing.

Additionally I have done a quick look (but could not find) at whether Ignite caches in memory store aggregates (like counts) which may be able to be returned without reading actual data as here.

Good luck!

On Tue, Jan 8, 2019 at 7:55 AM gweiske <gw...@eagleinvsys.com> wrote:
I am using Ignite 2.7 with persistence enabled on a single VM with 128 GB RAM
in Azure and separate external HDD drives each for wal, walarchive and
storage. I loaded 20 GB of data/50,000,000 rows, then shut down Ignite and
restarted the hosting VM, started and activated Ignite and ran a simple
query
that requires sorting through all the data (SELECT DISTINCT <column> FROM 
;). The query has been running for hours now. Looking at the memory, instead
of the expected ~42 GB it is currently at 5.7GB (*slowly* increasing). Any
ideas why it might be that slow? 
The same scenario with SSD drives (this time 1 drive for wal and walarchive,
a second one for storage) finishes in about 5500 seconds (still slow).



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Ignite 2.7 Persistence

Posted by Glenn Wiebe <gl...@gridgain.com>.

I am new to Ignite, but as I understand it, after cluster restart, data is
re-hydrated into memory as the nodes receive requests for their partitions'
entries. So, a first query would be as slow as a distributed disk-based
query. Subsequent queries should have some (depending on memory available)
information in memory and thus faster.

So, my question, is this the first query execution since startup?
Given that you have sufficient memory to hold this particular cache, I
would expect subsequent query executions to take advantage of memory
resident query processing.

Additionally I have done a quick look (but could not find) at whether
Ignite caches in memory store aggregates (like counts) which may be able to
be returned without reading actual data as here.

Good luck!

On Tue, Jan 8, 2019 at 7:55 AM gweiske <gw...@eagleinvsys.com> wrote:

> I am using Ignite 2.7 with persistence enabled on a single VM with 128 GB
> RAM
> in Azure and separate external HDD drives each for wal, walarchive and
> storage. I loaded 20 GB of data/50,000,000 rows, then shut down Ignite and
> restarted the hosting VM, started and activated Ignite and ran a simple
> query
> that requires sorting through all the data (SELECT DISTINCT <column> FROM
> ;). The query has been running for hours now. Looking at the memory,
> instead
> of the expected ~42 GB it is currently at 5.7GB (*slowly* increasing). Any
> ideas why it might be that slow?
> The same scenario with SSD drives (this time 1 drive for wal and
> walarchive,
> a second one for storage) finishes in about 5500 seconds (still slow).
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>