You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by Sebastian Macke <se...@macke.de> on 2021/04/13 18:28:43 UTC

Fastest way to iterate over a persistence cache

Hi Ignite Team,

I have stumbled across a problem when iterating over a persistence cache
that does not
fit into memory.

The partitioned cache consists of 50M entries across 3 nodes with a total
cache size of 3*80GB on the volumes.

I use either a ScanQuery or a SQL query over a non-indexed table. Both
results are the same.

It can take over an hour to iterate over the entire cache. The problem seems
to be that the cache is read in random 4kB (page size) chunks unparallelized
from the volume. A page size of 8kB exactly doubles the iteration speed.

Is this Ignite's default behaviour? Is there an option to enable a more
streaming like solution?
Of course, the order of the items in the cache doesn't matter.

Thanks,

Sebastian 



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Fastest way to iterate over a persistence cache

Posted by Ilya Kazakov <ka...@gmail.com>.
Hi Sebastian.

You can try to use QueryParallelism feature, also try to use LazyLoading
feature. More details is here:
https://ignite.apache.org/docs/latest/perf-and-troubleshooting/sql-tuning

Thanks,

Ilya

ср, 14 апр. 2021 г. в 02:28, Sebastian Macke <se...@macke.de>:

> Hi Ignite Team,
>
> I have stumbled across a problem when iterating over a persistence cache
> that does not
> fit into memory.
>
> The partitioned cache consists of 50M entries across 3 nodes with a total
> cache size of 3*80GB on the volumes.
>
> I use either a ScanQuery or a SQL query over a non-indexed table. Both
> results are the same.
>
> It can take over an hour to iterate over the entire cache. The problem
> seems
> to be that the cache is read in random 4kB (page size) chunks
> unparallelized
> from the volume. A page size of 8kB exactly doubles the iteration speed.
>
> Is this Ignite's default behaviour? Is there an option to enable a more
> streaming like solution?
> Of course, the order of the items in the cache doesn't matter.
>
> Thanks,
>
> Sebastian
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: Fastest way to iterate over a persistence cache

Posted by Sebastian Macke <se...@macke.de>.
Thanks Stephen, 

that did the trick in combination with the preloadPartition function.
My code is no longer IO-limited, but CPU limited and runs multiple times
faster.

Regards
Sebastian





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Fastest way to iterate over a persistence cache

Posted by Stephen Darlington <st...@gridgain.com>.
You may need to do the parallelism “manually.” Something like:

for (int p = 0; p < ignite.affinity("TEST").partitions(); p++) {
    ignite.compute().affinityRunAsync(Arrays.asList("TEST"), p, () -> {
        ScanQuery q = new ScanQuery()
                .setPartition(p);
        System.out.println(ignite.cache("TEST").query(q).getAll());
    });
}

Regards,
Stephen

> On 15 Apr 2021, at 17:52, Sebastian Macke <se...@macke.de> wrote:
> 
> Hi,
> 
> of course we are using SSDs, but 4kB is still a tiny amount, when you have
> to wait for the SSD read roundtrip every time. The CPU is idle at about
> 10%.. The Linux kernel does not seem to perform a readahead, probably
> because of the random access structure.
> 
> For some reason the QueryParallelism feature does not work either. No change
> in the speed.
> 
> Regards
> 
> Sebastian
> 
> 
> 
> 
> 
> 
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/



Re: Fastest way to iterate over a persistence cache

Posted by Sebastian Macke <se...@macke.de>.
Hi,

of course we are using SSDs, but 4kB is still a tiny amount, when you have
to wait for the SSD read roundtrip every time. The CPU is idle at about
10%.. The Linux kernel does not seem to perform a readahead, probably
because of the random access structure.

For some reason the QueryParallelism feature does not work either. No change
in the speed.

Regards

Sebastian






--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Fastest way to iterate over a persistence cache

Posted by Ilya Kasnacheev <il...@gmail.com>.
Hello!

We had a per-page scanning of caches once, but it is disabled for some time
because it was causing synchronization issues.

Apache Ignite is still a memory-centric database, which assumed that data
is either in memory or may be loaded to memory relatively quickly.

So I guess the only cache scan option currently is by reading all blocks in
random.

We also assume that persistence setup uses SSD, which has random read
speeds on par with sequential (and the term  "sequential" may not be
applicable to SSD at all). If your setup is based on HDD it may indeed not
work optimally.

Regards,
-- 
Ilya Kasnacheev


вт, 13 апр. 2021 г. в 21:28, Sebastian Macke <se...@macke.de>:

> Hi Ignite Team,
>
> I have stumbled across a problem when iterating over a persistence cache
> that does not
> fit into memory.
>
> The partitioned cache consists of 50M entries across 3 nodes with a total
> cache size of 3*80GB on the volumes.
>
> I use either a ScanQuery or a SQL query over a non-indexed table. Both
> results are the same.
>
> It can take over an hour to iterate over the entire cache. The problem
> seems
> to be that the cache is read in random 4kB (page size) chunks
> unparallelized
> from the volume. A page size of 8kB exactly doubles the iteration speed.
>
> Is this Ignite's default behaviour? Is there an option to enable a more
> streaming like solution?
> Of course, the order of the items in the cache doesn't matter.
>
> Thanks,
>
> Sebastian
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>