You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@ignite.apache.org by Zhenya Stanilovsky <ar...@mail.ru.INVALID> on 2018/09/19 06:58:57 UTC

Re[4]: Cache scan efficiency

Great, i don`t think about that.


>Среда, 19 сентября 2018, 9:40 +03:00 от Vladimir Ozerov <vo...@gridgain.com>:
>
>Pinning is even worse thing, because you loose control on how data is moved
>within a single region. Instead, I would suggest to use partition warmup +
>separate data region to achieve "pinning" semantics.
>
>On Wed, Sep 19, 2018 at 8:34 AM Zhenya Stanilovsky
>< arzamas123@mail.ru.invalid > wrote:
>
>> hi, but how to deal with page replacements, which Dmitriy Pavlov mentioned?
>> this approach would be efficient if all data fits into memory, may be
>> better to have method to pin some critical caches?
>>
>>
>> >Среда, 19 сентября 2018, 0:26 +03:00 от Dmitriy Pavlov <
>>  dpavlov.spb@gmail.com >:
>> >
>> >Even better, if RAM is exhausted page replacement process will be started.
>> >
>>  https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Durable+Memory+-+under+the+hood#IgniteDurableMemory-underthehood-Pagereplacement(rotationwithdisk
>> )
>> >
>> >Effect of the preloading will be still markable, but not as excelled as
>> >with full-fitting into RAM. Later I can review or improve javadocs if it
>> is
>> >necessary.
>> >
>> >ср, 19 сент. 2018 г. в 0:18, Denis Magda <  dmagda@apache.org >:
>> >
>> >> Agree, it's just a matter of the documentation. If a user stores 100% in
>> >> RAM and on disk, and just wants to warm RAM up after a restart then he
>> >> knows everything will fit there. If during the preloading we detect that
>> >> the RAM is exhausted we can halt it and print out a warning.
>> >>
>> >> --
>> >> Denis
>> >>
>> >> On Tue, Sep 18, 2018 at 2:10 PM Dmitriy Pavlov <  dpavlov.spb@gmail.com
>> >
>> >> wrote:
>> >>
>> >> > Hi,
>> >> >
>> >> > I totally support the idea of cache preload.
>> >> >
>> >> > IMO it can be expanded. We can iterate over local partitions of the
>> cache
>> >> > group and preload each.
>> >> >
>> >> > But it should be really clear documented methods so a user can be
>> aware
>> >> of
>> >> > the benefits of such method (e.g. if RAM region is big enough, etc).
>> >> >
>> >> > Sincerely,
>> >> > Dmitriy Pavlov
>> >> >
>> >> > вт, 18 сент. 2018 г. в 21:36, Denis Magda <  dmagda@apache.org >:
>> >> >
>> >> > > Folks,
>> >> > >
>> >> > > Since we're adding a method that would preload a certain partition,
>> can
>> >> > we
>> >> > > add the one which will preload the whole cache? Ignite persistence
>> >> users
>> >> > > I've been working with look puzzled once they realize there is no
>> way
>> >> to
>> >> > > warm up RAM after the restart. There are use cases that require
>> this.
>> >> > >
>> >> > > Can the current optimizations be expanded to the cache preloading
>> use
>> >> > case?
>> >> > >
>> >> > > --
>> >> > > Denis
>> >> > >
>> >> > > On Tue, Sep 18, 2018 at 3:58 AM Alexei Scherbakov <
>> >> > >  alexey.scherbakoff@gmail.com > wrote:
>> >> > >
>> >> > > > Summing up, I suggest adding new public
>> >> > > > method IgniteCache.preloadPartition(partId).
>> >> > > >
>> >> > > > I will start preparing PR for IGNITE-8873
>> >> > > > <  https://issues.apache.org/jira/browse/IGNITE-8873 > if no more
>> >> > > objections
>> >> > > > follow.
>> >> > > >
>> >> > > >
>> >> > > >
>> >> > > > вт, 18 сент. 2018 г. в 10:50, Alexey Goncharuk <
>> >> > >  alexey.goncharuk@gmail.com
>> >> > > > >:
>> >> > > >
>> >> > > > > Dmitriy,
>> >> > > > >
>> >> > > > > In my understanding, the proper fix for the scan query looks
>> like a
>> >> > big
>> >> > > > > change and it is unlikely that we include it in Ignite 2.7. On
>> the
>> >> > > other
>> >> > > > > hand, the method suggested by Alexei is quite simple  and it
>> >> > definitely
>> >> > > > > fits Ignite 2.7, which will provide a better user experience.
>> Even
>> >> > > > having a
>> >> > > > > proper scan query implemented this method can be useful in some
>> >> > > specific
>> >> > > > > scenarios, so we will not have to deprecate it.
>> >> > > > >
>> >> > > > > --AG
>> >> > > > >
>> >> > > > > пн, 17 сент. 2018 г. в 19:15, Dmitriy Pavlov <
>> >>  dpavlov.spb@gmail.com
>> >> > >:
>> >> > > > >
>> >> > > > > > As I understood it is not a hack, it is an advanced feature
>> for
>> >> > > warming
>> >> > > > > up
>> >> > > > > > the partition. We can build warm-up of the overall cache by
>> >> calling
>> >> > > its
>> >> > > > > > partitions warm-up. Users often ask about this feature and are
>> >> not
>> >> > > > > > confident with our lazy upload.
>> >> > > > > >
>> >> > > > > > Please correct me if I misunderstood the idea.
>> >> > > > > >
>> >> > > > > > пн, 17 сент. 2018 г. в 18:37, Dmitriy Setrakyan <
>> >> > >  dsetrakyan@apache.org
>> >> > > > >:
>> >> > > > > >
>> >> > > > > > > I would rather fix the scan than hack the scan. Is there any
>> >> > > > technical
>> >> > > > > > > reason for hacking it now instead of fixing it properly? Can
>> >> some
>> >> > > of
>> >> > > > > the
>> >> > > > > > > experts in this thread provide an estimate of complexity and
>> >> > > > difference
>> >> > > > > > in
>> >> > > > > > > work that would be required for each approach?
>> >> > > > > > >
>> >> > > > > > > D.
>> >> > > > > > >
>> >> > > > > > > On Mon, Sep 17, 2018 at 4:42 PM Alexey Goncharuk <
>> >> > > > > > >  alexey.goncharuk@gmail.com >
>> >> > > > > > > wrote:
>> >> > > > > > >
>> >> > > > > > > > I think it would be beneficial for some Ignite users if we
>> >> > added
>> >> > > > > such a
>> >> > > > > > > > partition warmup method to the public API. The method
>> should
>> >> be
>> >> > > > > > > > well-documented and state that it may invalidate existing
>> >> page
>> >> > > > cache.
>> >> > > > > > It
>> >> > > > > > > > will be a very effective instrument until we add the
>> proper
>> >> > scan
>> >> > > > > > ability
>> >> > > > > > > > that Vladimir was referring to.
>> >> > > > > > > >
>> >> > > > > > > > пн, 17 сент. 2018 г. в 13:05, Maxim Muzafarov <
>> >> > >  maxmuzaf@gmail.com
>> >> > > > >:
>> >> > > > > > > >
>> >> > > > > > > > > Folks,
>> >> > > > > > > > >
>> >> > > > > > > > > Such warming up can be an effective technique for
>> >> performing
>> >> > > > > > > calculations
>> >> > > > > > > > > which required large cache
>> >> > > > > > > > > data reads, but I think it's the single narrow use case
>> of
>> >> > all
>> >> > > > over
>> >> > > > > > > > Ignite
>> >> > > > > > > > > store usages. Like all other
>> >> > > > > > > > > powerfull techniques, we should use it wisely. In the
>> >> general
>> >> > > > > case, I
>> >> > > > > > > > think
>> >> > > > > > > > > we should consider other
>> >> > > > > > > > > techniques mentioned by Vladimir and may create
>> something
>> >> > like
>> >> > > > > > `global
>> >> > > > > > > > > statistics of cache data usage`
>> >> > > > > > > > > to choose the best technique in each case.
>> >> > > > > > > > >
>> >> > > > > > > > > For instance, it's not obvious what would take longer:
>> >> > > > multi-block
>> >> > > > > > > reads
>> >> > > > > > > > or
>> >> > > > > > > > > 50 single-block reads issues
>> >> > > > > > > > > sequentially. It strongly depends on used hardware under
>> >> the
>> >> > > hood
>> >> > > > > and
>> >> > > > > > > > might
>> >> > > > > > > > > depend on workload system
>> >> > > > > > > > > resources (CPU-intensive calculations and I\O access) as
>> >> > well.
>> >> > > > But
>> >> > > > > > > > > `statistics` will help us to choose
>> >> > > > > > > > > the right way.
>> >> > > > > > > > >
>> >> > > > > > > > >
>> >> > > > > > > > > On Sun, 16 Sep 2018 at 23:59 Dmitriy Pavlov <
>> >> > > >  dpavlov.spb@gmail.com
>> >> > > > > >
>> >> > > > > > > > wrote:
>> >> > > > > > > > >
>> >> > > > > > > > > > Hi Alexei,
>> >> > > > > > > > > >
>> >> > > > > > > > > > I did not find any PRs associated with the ticket for
>> >> check
>> >> > > > code
>> >> > > > > > > > changes
>> >> > > > > > > > > > behind this idea. Are there any PRs?
>> >> > > > > > > > > >
>> >> > > > > > > > > > If we create some forwards scan of pages, it should
>> be a
>> >> > very
>> >> > > > > > > > > intellectual
>> >> > > > > > > > > > algorithm including a lot of parameters (how much RAM
>> is
>> >> > > free,
>> >> > > > > how
>> >> > > > > > > > > probably
>> >> > > > > > > > > > we will need next page, etc). We had the private talk
>> >> about
>> >> > > > such
>> >> > > > > > idea
>> >> > > > > > > > > some
>> >> > > > > > > > > > time ago.
>> >> > > > > > > > > >
>> >> > > > > > > > > > By my experience, Linux systems already do such
>> forward
>> >> > > reading
>> >> > > > > of
>> >> > > > > > > file
>> >> > > > > > > > > > data (for corresponding sequential flagged file
>> >> > descriptors),
>> >> > > > but
>> >> > > > > > > some
>> >> > > > > > > > > > prefetching of data at the level of application may be
>> >> > useful
>> >> > > > for
>> >> > > > > > > > > O_DIRECT
>> >> > > > > > > > > > file descriptors.
>> >> > > > > > > > > >
>> >> > > > > > > > > > And one more concern from me is about selecting a
>> right
>> >> > place
>> >> > > > in
>> >> > > > > > the
>> >> > > > > > > > > system
>> >> > > > > > > > > > to do such prefetch.
>> >> > > > > > > > > >
>> >> > > > > > > > > > Sincerely,
>> >> > > > > > > > > > Dmitriy Pavlov
>> >> > > > > > > > > >
>> >> > > > > > > > > > вс, 16 сент. 2018 г. в 19:54, Vladimir Ozerov <
>> >> > > > > >  vozerov@gridgain.com
>> >> > > > > > > >:
>> >> > > > > > > > > >
>> >> > > > > > > > > > > HI Alex,
>> >> > > > > > > > > > >
>> >> > > > > > > > > > > This is good that you observed speedup. But I do not
>> >> > think
>> >> > > > this
>> >> > > > > > > > > solution
>> >> > > > > > > > > > > works for the product in general case. Amount of
>> RAM is
>> >> > > > > limited,
>> >> > > > > > > and
>> >> > > > > > > > > > even a
>> >> > > > > > > > > > > single partition may need more space than RAM
>> >> available.
>> >> > > > > Moving a
>> >> > > > > > > lot
>> >> > > > > > > > > of
>> >> > > > > > > > > > > pages to page memory for scan means that you evict a
>> >> lot
>> >> > of
>> >> > > > > other
>> >> > > > > > > > > pages,
>> >> > > > > > > > > > > what will ultimately lead to bad performance of
>> >> > subsequent
>> >> > > > > > queries
>> >> > > > > > > > and
>> >> > > > > > > > > > > defeat LRU algorithms, which are of great improtance
>> >> for
>> >> > > good
>> >> > > > > > > > database
>> >> > > > > > > > > > > performance.
>> >> > > > > > > > > > >
>> >> > > > > > > > > > > Database vendors choose another approach - skip
>> BTrees,
>> >> > > > iterate
>> >> > > > > > > > > direclty
>> >> > > > > > > > > > > over data pages, read them in multi-block fashion,
>> use
>> >> > > > separate
>> >> > > > > > > scan
>> >> > > > > > > > > > buffer
>> >> > > > > > > > > > > to avoid excessive evictions of other hot pages.
>> >> > > > Corresponding
>> >> > > > > > > ticket
>> >> > > > > > > > > for
>> >> > > > > > > > > > > SQL exists [1], but idea is common for all parts of
>> the
>> >> > > > system,
>> >> > > > > > > > > requiring
>> >> > > > > > > > > > > scans.
>> >> > > > > > > > > > >
>> >> > > > > > > > > > > As far as proposed solution, it might be good idea
>> to
>> >> add
>> >> > > > > special
>> >> > > > > > > API
>> >> > > > > > > > > to
>> >> > > > > > > > > > > "warmup" partition with clear explanation of pros
>> (fast
>> >> > > scan
>> >> > > > > > after
>> >> > > > > > > > > > warmup)
>> >> > > > > > > > > > > and cons (slowdown of any other operations). But I
>> >> think
>> >> > we
>> >> > > > > > should
>> >> > > > > > > > not
>> >> > > > > > > > > > make
>> >> > > > > > > > > > > this approach part of normal scans.
>> >> > > > > > > > > > >
>> >> > > > > > > > > > > Vladimir.
>> >> > > > > > > > > > >
>> >> > > > > > > > > > > [1]
>>  https://issues.apache.org/jira/browse/IGNITE-6057
>> >> > > > > > > > > > >
>> >> > > > > > > > > > >
>> >> > > > > > > > > > > On Sun, Sep 16, 2018 at 6:44 PM Alexei Scherbakov <
>> >> > > > > > > > > > >  alexey.scherbakoff@gmail.com > wrote:
>> >> > > > > > > > > > >
>> >> > > > > > > > > > > > Igniters,
>> >> > > > > > > > > > > >
>> >> > > > > > > > > > > > My use case involves scenario where it's
>> necessary to
>> >> > > > iterate
>> >> > > > > > > over
>> >> > > > > > > > > > > > large(many TBs) persistent cache doing some
>> >> calculation
>> >> > > on
>> >> > > > > read
>> >> > > > > > > > data.
>> >> > > > > > > > > > > >
>> >> > > > > > > > > > > > The basic solution is to iterate cache using
>> >> ScanQuery.
>> >> > > > > > > > > > > >
>> >> > > > > > > > > > > > This turns out to be slow because iteration over
>> >> cache
>> >> > > > > > involves a
>> >> > > > > > > > lot
>> >> > > > > > > > > > of
>> >> > > > > > > > > > > > random disk access for reading data pages
>> referenced
>> >> > from
>> >> > > > > leaf
>> >> > > > > > > > pages
>> >> > > > > > > > > by
>> >> > > > > > > > > > > > links.
>> >> > > > > > > > > > > >
>> >> > > > > > > > > > > > This is especially true when data is stored on
>> disks
>> >> > with
>> >> > > > > slow
>> >> > > > > > > > random
>> >> > > > > > > > > > > > access, like SAS disks. In my case on modern SAS
>> >> disks
>> >> > > > array
>> >> > > > > > > > reading
>> >> > > > > > > > > > > speed
>> >> > > > > > > > > > > > was like several MB/sec while sequential read
>> speed
>> >> in
>> >> > > perf
>> >> > > > > > test
>> >> > > > > > > > was
>> >> > > > > > > > > > > about
>> >> > > > > > > > > > > > GB/sec.
>> >> > > > > > > > > > > >
>> >> > > > > > > > > > > > I was able to fix the issue by using ScanQuery
>> with
>> >> > > > explicit
>> >> > > > > > > > > partition
>> >> > > > > > > > > > > set
>> >> > > > > > > > > > > > and running simple warmup code before each
>> partition
>> >> > > scan.
>> >> > > > > > > > > > > >
>> >> > > > > > > > > > > > The code pins cold pages in memory in sequential
>> >> order
>> >> > > thus
>> >> > > > > > > > > eliminating
>> >> > > > > > > > > > > > random disk access. Speedup was like x100
>> magnitude.
>> >> > > > > > > > > > > >
>> >> > > > > > > > > > > > I suggest adding the improvement to the product's
>> >> core
>> >> > > by
>> >> > > > > > always
>> >> > > > > > > > > > > > sequentially preloading pages for all internal
>> >> > partition
>> >> > > > > > > iterations
>> >> > > > > > > > > > > (cache
>> >> > > > > > > > > > > > iterators, scan queries, sql queries with scan
>> plan)
>> >> if
>> >> > > > > > partition
>> >> > > > > > > > is
>> >> > > > > > > > > > cold
>> >> > > > > > > > > > > > (low number of pinned pages).
>> >> > > > > > > > > > > >
>> >> > > > > > > > > > > > This also should speed up rebalancing from cold
>> >> > > partitions.
>> >> > > > > > > > > > > >
>> >> > > > > > > > > > > > Ignite JIRA ticket [1]
>> >> > > > > > > > > > > >
>> >> > > > > > > > > > > > Thoughts ?
>> >> > > > > > > > > > > >
>> >> > > > > > > > > > > > [1]
>> >>  https://issues.apache.org/jira/browse/IGNITE-8873
>> >> > > > > > > > > > > >
>> >> > > > > > > > > > > > --
>> >> > > > > > > > > > > >
>> >> > > > > > > > > > > > Best regards,
>> >> > > > > > > > > > > > Alexei Scherbakov
>> >> > > > > > > > > > > >
>> >> > > > > > > > > > >
>> >> > > > > > > > > >
>> >> > > > > > > > > --
>> >> > > > > > > > > --
>> >> > > > > > > > > Maxim Muzafarov
>> >> > > > > > > > >
>> >> > > > > > > >
>> >> > > > > > >
>> >> > > > > >
>> >> > > > >
>> >> > > >
>> >> > > >
>> >> > > > --
>> >> > > >
>> >> > > > Best regards,
>> >> > > > Alexei Scherbakov
>> >> > > >
>> >> > >
>> >> >
>> >>
>>
>>
>> --
>> Zhenya Stanilovsky
>>


-- 
Zhenya Stanilovsky

UNCHECKED Re: Re[4]: Cache scan efficiency

Posted by Maxim Muzafarov <ma...@gmail.com>.

Alexei,

> Summing up, I suggest adding new public
method IgniteCache.preloadPartition(partId).

If I understand correctly use case, the user wants to retrieve whole data
from
the cache (not only single partition) having slow HDD. So, my suggestion is
to
create methods of public API like these:

`public IgniteCache<K, V> withPartitionsWarmup();`
`public IgniteCache<K, V> withPartitionWarmup(int partition);`

On Wed, 19 Sep 2018 at 09:59 Zhenya Stanilovsky <ar...@mail.ru.invalid>
wrote:

> Great, i don`t think about that.
>
>
> >Среда, 19 сентября 2018, 9:40 +03:00 от Vladimir Ozerov <
> vozerov@gridgain.com>:
> >
> >Pinning is even worse thing, because you loose control on how data is
> moved
> >within a single region. Instead, I would suggest to use partition warmup +
> >separate data region to achieve "pinning" semantics.
> >
> >On Wed, Sep 19, 2018 at 8:34 AM Zhenya Stanilovsky
> >< arzamas123@mail.ru.invalid > wrote:
> >
> >> hi, but how to deal with page replacements, which Dmitriy Pavlov
> mentioned?
> >> this approach would be efficient if all data fits into memory, may be
> >> better to have method to pin some critical caches?
> >>
> >>
> >> >Среда, 19 сентября 2018, 0:26 +03:00 от Dmitriy Pavlov <
> >>  dpavlov.spb@gmail.com >:
> >> >
> >> >Even better, if RAM is exhausted page replacement process will be
> started.
> >> >
> >>
> https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Durable+Memory+-+under+the+hood#IgniteDurableMemory-underthehood-Pagereplacement(rotationwithdisk
> >> )
> >> >
> >> >Effect of the preloading will be still markable, but not as excelled as
> >> >with full-fitting into RAM. Later I can review or improve javadocs if
> it
> >> is
> >> >necessary.
> >> >
> >> >ср, 19 сент. 2018 г. в 0:18, Denis Magda <  dmagda@apache.org >:
> >> >
> >> >> Agree, it's just a matter of the documentation. If a user stores
> 100% in
> >> >> RAM and on disk, and just wants to warm RAM up after a restart then
> he
> >> >> knows everything will fit there. If during the preloading we detect
> that
> >> >> the RAM is exhausted we can halt it and print out a warning.
> >> >>
> >> >> --
> >> >> Denis
> >> >>
> >> >> On Tue, Sep 18, 2018 at 2:10 PM Dmitriy Pavlov <
> dpavlov.spb@gmail.com
> >> >
> >> >> wrote:
> >> >>
> >> >> > Hi,
> >> >> >
> >> >> > I totally support the idea of cache preload.
> >> >> >
> >> >> > IMO it can be expanded. We can iterate over local partitions of the
> >> cache
> >> >> > group and preload each.
> >> >> >
> >> >> > But it should be really clear documented methods so a user can be
> >> aware
> >> >> of
> >> >> > the benefits of such method (e.g. if RAM region is big enough,
> etc).
> >> >> >
> >> >> > Sincerely,
> >> >> > Dmitriy Pavlov
> >> >> >
> >> >> > вт, 18 сент. 2018 г. в 21:36, Denis Magda <  dmagda@apache.org >:
> >> >> >
> >> >> > > Folks,
> >> >> > >
> >> >> > > Since we're adding a method that would preload a certain
> partition,
> >> can
> >> >> > we
> >> >> > > add the one which will preload the whole cache? Ignite
> persistence
> >> >> users
> >> >> > > I've been working with look puzzled once they realize there is no
> >> way
> >> >> to
> >> >> > > warm up RAM after the restart. There are use cases that require
> >> this.
> >> >> > >
> >> >> > > Can the current optimizations be expanded to the cache preloading
> >> use
> >> >> > case?
> >> >> > >
> >> >> > > --
> >> >> > > Denis
> >> >> > >
> >> >> > > On Tue, Sep 18, 2018 at 3:58 AM Alexei Scherbakov <
> >> >> > >  alexey.scherbakoff@gmail.com > wrote:
> >> >> > >
> >> >> > > > Summing up, I suggest adding new public
> >> >> > > > method IgniteCache.preloadPartition(partId).
> >> >> > > >
> >> >> > > > I will start preparing PR for IGNITE-8873
> >> >> > > > <  https://issues.apache.org/jira/browse/IGNITE-8873 > if no
> more
> >> >> > > objections
> >> >> > > > follow.
> >> >> > > >
> >> >> > > >
> >> >> > > >
> >> >> > > > вт, 18 сент. 2018 г. в 10:50, Alexey Goncharuk <
> >> >> > >  alexey.goncharuk@gmail.com
> >> >> > > > >:
> >> >> > > >
> >> >> > > > > Dmitriy,
> >> >> > > > >
> >> >> > > > > In my understanding, the proper fix for the scan query looks
> >> like a
> >> >> > big
> >> >> > > > > change and it is unlikely that we include it in Ignite 2.7.
> On
> >> the
> >> >> > > other
> >> >> > > > > hand, the method suggested by Alexei is quite simple  and it
> >> >> > definitely
> >> >> > > > > fits Ignite 2.7, which will provide a better user experience.
> >> Even
> >> >> > > > having a
> >> >> > > > > proper scan query implemented this method can be useful in
> some
> >> >> > > specific
> >> >> > > > > scenarios, so we will not have to deprecate it.
> >> >> > > > >
> >> >> > > > > --AG
> >> >> > > > >
> >> >> > > > > пн, 17 сент. 2018 г. в 19:15, Dmitriy Pavlov <
> >> >>  dpavlov.spb@gmail.com
> >> >> > >:
> >> >> > > > >
> >> >> > > > > > As I understood it is not a hack, it is an advanced feature
> >> for
> >> >> > > warming
> >> >> > > > > up
> >> >> > > > > > the partition. We can build warm-up of the overall cache by
> >> >> calling
> >> >> > > its
> >> >> > > > > > partitions warm-up. Users often ask about this feature and
> are
> >> >> not
> >> >> > > > > > confident with our lazy upload.
> >> >> > > > > >
> >> >> > > > > > Please correct me if I misunderstood the idea.
> >> >> > > > > >
> >> >> > > > > > пн, 17 сент. 2018 г. в 18:37, Dmitriy Setrakyan <
> >> >> > >  dsetrakyan@apache.org
> >> >> > > > >:
> >> >> > > > > >
> >> >> > > > > > > I would rather fix the scan than hack the scan. Is there
> any
> >> >> > > > technical
> >> >> > > > > > > reason for hacking it now instead of fixing it properly?
> Can
> >> >> some
> >> >> > > of
> >> >> > > > > the
> >> >> > > > > > > experts in this thread provide an estimate of complexity
> and
> >> >> > > > difference
> >> >> > > > > > in
> >> >> > > > > > > work that would be required for each approach?
> >> >> > > > > > >
> >> >> > > > > > > D.
> >> >> > > > > > >
> >> >> > > > > > > On Mon, Sep 17, 2018 at 4:42 PM Alexey Goncharuk <
> >> >> > > > > > >  alexey.goncharuk@gmail.com >
> >> >> > > > > > > wrote:
> >> >> > > > > > >
> >> >> > > > > > > > I think it would be beneficial for some Ignite users
> if we
> >> >> > added
> >> >> > > > > such a
> >> >> > > > > > > > partition warmup method to the public API. The method
> >> should
> >> >> be
> >> >> > > > > > > > well-documented and state that it may invalidate
> existing
> >> >> page
> >> >> > > > cache.
> >> >> > > > > > It
> >> >> > > > > > > > will be a very effective instrument until we add the
> >> proper
> >> >> > scan
> >> >> > > > > > ability
> >> >> > > > > > > > that Vladimir was referring to.
> >> >> > > > > > > >
> >> >> > > > > > > > пн, 17 сент. 2018 г. в 13:05, Maxim Muzafarov <
> >> >> > >  maxmuzaf@gmail.com
> >> >> > > > >:
> >> >> > > > > > > >
> >> >> > > > > > > > > Folks,
> >> >> > > > > > > > >
> >> >> > > > > > > > > Such warming up can be an effective technique for
> >> >> performing
> >> >> > > > > > > calculations
> >> >> > > > > > > > > which required large cache
> >> >> > > > > > > > > data reads, but I think it's the single narrow use
> case
> >> of
> >> >> > all
> >> >> > > > over
> >> >> > > > > > > > Ignite
> >> >> > > > > > > > > store usages. Like all other
> >> >> > > > > > > > > powerfull techniques, we should use it wisely. In the
> >> >> general
> >> >> > > > > case, I
> >> >> > > > > > > > think
> >> >> > > > > > > > > we should consider other
> >> >> > > > > > > > > techniques mentioned by Vladimir and may create
> >> something
> >> >> > like
> >> >> > > > > > `global
> >> >> > > > > > > > > statistics of cache data usage`
> >> >> > > > > > > > > to choose the best technique in each case.
> >> >> > > > > > > > >
> >> >> > > > > > > > > For instance, it's not obvious what would take
> longer:
> >> >> > > > multi-block
> >> >> > > > > > > reads
> >> >> > > > > > > > or
> >> >> > > > > > > > > 50 single-block reads issues
> >> >> > > > > > > > > sequentially. It strongly depends on used hardware
> under
> >> >> the
> >> >> > > hood
> >> >> > > > > and
> >> >> > > > > > > > might
> >> >> > > > > > > > > depend on workload system
> >> >> > > > > > > > > resources (CPU-intensive calculations and I\O
> access) as
> >> >> > well.
> >> >> > > > But
> >> >> > > > > > > > > `statistics` will help us to choose
> >> >> > > > > > > > > the right way.
> >> >> > > > > > > > >
> >> >> > > > > > > > >
> >> >> > > > > > > > > On Sun, 16 Sep 2018 at 23:59 Dmitriy Pavlov <
> >> >> > > >  dpavlov.spb@gmail.com
> >> >> > > > > >
> >> >> > > > > > > > wrote:
> >> >> > > > > > > > >
> >> >> > > > > > > > > > Hi Alexei,
> >> >> > > > > > > > > >
> >> >> > > > > > > > > > I did not find any PRs associated with the ticket
> for
> >> >> check
> >> >> > > > code
> >> >> > > > > > > > changes
> >> >> > > > > > > > > > behind this idea. Are there any PRs?
> >> >> > > > > > > > > >
> >> >> > > > > > > > > > If we create some forwards scan of pages, it should
> >> be a
> >> >> > very
> >> >> > > > > > > > > intellectual
> >> >> > > > > > > > > > algorithm including a lot of parameters (how much
> RAM
> >> is
> >> >> > > free,
> >> >> > > > > how
> >> >> > > > > > > > > probably
> >> >> > > > > > > > > > we will need next page, etc). We had the private
> talk
> >> >> about
> >> >> > > > such
> >> >> > > > > > idea
> >> >> > > > > > > > > some
> >> >> > > > > > > > > > time ago.
> >> >> > > > > > > > > >
> >> >> > > > > > > > > > By my experience, Linux systems already do such
> >> forward
> >> >> > > reading
> >> >> > > > > of
> >> >> > > > > > > file
> >> >> > > > > > > > > > data (for corresponding sequential flagged file
> >> >> > descriptors),
> >> >> > > > but
> >> >> > > > > > > some
> >> >> > > > > > > > > > prefetching of data at the level of application
> may be
> >> >> > useful
> >> >> > > > for
> >> >> > > > > > > > > O_DIRECT
> >> >> > > > > > > > > > file descriptors.
> >> >> > > > > > > > > >
> >> >> > > > > > > > > > And one more concern from me is about selecting a
> >> right
> >> >> > place
> >> >> > > > in
> >> >> > > > > > the
> >> >> > > > > > > > > system
> >> >> > > > > > > > > > to do such prefetch.
> >> >> > > > > > > > > >
> >> >> > > > > > > > > > Sincerely,
> >> >> > > > > > > > > > Dmitriy Pavlov
> >> >> > > > > > > > > >
> >> >> > > > > > > > > > вс, 16 сент. 2018 г. в 19:54, Vladimir Ozerov <
> >> >> > > > > >  vozerov@gridgain.com
> >> >> > > > > > > >:
> >> >> > > > > > > > > >
> >> >> > > > > > > > > > > HI Alex,
> >> >> > > > > > > > > > >
> >> >> > > > > > > > > > > This is good that you observed speedup. But I do
> not
> >> >> > think
> >> >> > > > this
> >> >> > > > > > > > > solution
> >> >> > > > > > > > > > > works for the product in general case. Amount of
> >> RAM is
> >> >> > > > > limited,
> >> >> > > > > > > and
> >> >> > > > > > > > > > even a
> >> >> > > > > > > > > > > single partition may need more space than RAM
> >> >> available.
> >> >> > > > > Moving a
> >> >> > > > > > > lot
> >> >> > > > > > > > > of
> >> >> > > > > > > > > > > pages to page memory for scan means that you
> evict a
> >> >> lot
> >> >> > of
> >> >> > > > > other
> >> >> > > > > > > > > pages,
> >> >> > > > > > > > > > > what will ultimately lead to bad performance of
> >> >> > subsequent
> >> >> > > > > > queries
> >> >> > > > > > > > and
> >> >> > > > > > > > > > > defeat LRU algorithms, which are of great
> improtance
> >> >> for
> >> >> > > good
> >> >> > > > > > > > database
> >> >> > > > > > > > > > > performance.
> >> >> > > > > > > > > > >
> >> >> > > > > > > > > > > Database vendors choose another approach - skip
> >> BTrees,
> >> >> > > > iterate
> >> >> > > > > > > > > direclty
> >> >> > > > > > > > > > > over data pages, read them in multi-block
> fashion,
> >> use
> >> >> > > > separate
> >> >> > > > > > > scan
> >> >> > > > > > > > > > buffer
> >> >> > > > > > > > > > > to avoid excessive evictions of other hot pages.
> >> >> > > > Corresponding
> >> >> > > > > > > ticket
> >> >> > > > > > > > > for
> >> >> > > > > > > > > > > SQL exists [1], but idea is common for all parts
> of
> >> the
> >> >> > > > system,
> >> >> > > > > > > > > requiring
> >> >> > > > > > > > > > > scans.
> >> >> > > > > > > > > > >
> >> >> > > > > > > > > > > As far as proposed solution, it might be good
> idea
> >> to
> >> >> add
> >> >> > > > > special
> >> >> > > > > > > API
> >> >> > > > > > > > > to
> >> >> > > > > > > > > > > "warmup" partition with clear explanation of pros
> >> (fast
> >> >> > > scan
> >> >> > > > > > after
> >> >> > > > > > > > > > warmup)
> >> >> > > > > > > > > > > and cons (slowdown of any other operations). But
> I
> >> >> think
> >> >> > we
> >> >> > > > > > should
> >> >> > > > > > > > not
> >> >> > > > > > > > > > make
> >> >> > > > > > > > > > > this approach part of normal scans.
> >> >> > > > > > > > > > >
> >> >> > > > > > > > > > > Vladimir.
> >> >> > > > > > > > > > >
> >> >> > > > > > > > > > > [1]
> >>  https://issues.apache.org/jira/browse/IGNITE-6057
> >> >> > > > > > > > > > >
> >> >> > > > > > > > > > >
> >> >> > > > > > > > > > > On Sun, Sep 16, 2018 at 6:44 PM Alexei
> Scherbakov <
> >> >> > > > > > > > > > >  alexey.scherbakoff@gmail.com > wrote:
> >> >> > > > > > > > > > >
> >> >> > > > > > > > > > > > Igniters,
> >> >> > > > > > > > > > > >
> >> >> > > > > > > > > > > > My use case involves scenario where it's
> >> necessary to
> >> >> > > > iterate
> >> >> > > > > > > over
> >> >> > > > > > > > > > > > large(many TBs) persistent cache doing some
> >> >> calculation
> >> >> > > on
> >> >> > > > > read
> >> >> > > > > > > > data.
> >> >> > > > > > > > > > > >
> >> >> > > > > > > > > > > > The basic solution is to iterate cache using
> >> >> ScanQuery.
> >> >> > > > > > > > > > > >
> >> >> > > > > > > > > > > > This turns out to be slow because iteration
> over
> >> >> cache
> >> >> > > > > > involves a
> >> >> > > > > > > > lot
> >> >> > > > > > > > > > of
> >> >> > > > > > > > > > > > random disk access for reading data pages
> >> referenced
> >> >> > from
> >> >> > > > > leaf
> >> >> > > > > > > > pages
> >> >> > > > > > > > > by
> >> >> > > > > > > > > > > > links.
> >> >> > > > > > > > > > > >
> >> >> > > > > > > > > > > > This is especially true when data is stored on
> >> disks
> >> >> > with
> >> >> > > > > slow
> >> >> > > > > > > > random
> >> >> > > > > > > > > > > > access, like SAS disks. In my case on modern
> SAS
> >> >> disks
> >> >> > > > array
> >> >> > > > > > > > reading
> >> >> > > > > > > > > > > speed
> >> >> > > > > > > > > > > > was like several MB/sec while sequential read
> >> speed
> >> >> in
> >> >> > > perf
> >> >> > > > > > test
> >> >> > > > > > > > was
> >> >> > > > > > > > > > > about
> >> >> > > > > > > > > > > > GB/sec.
> >> >> > > > > > > > > > > >
> >> >> > > > > > > > > > > > I was able to fix the issue by using ScanQuery
> >> with
> >> >> > > > explicit
> >> >> > > > > > > > > partition
> >> >> > > > > > > > > > > set
> >> >> > > > > > > > > > > > and running simple warmup code before each
> >> partition
> >> >> > > scan.
> >> >> > > > > > > > > > > >
> >> >> > > > > > > > > > > > The code pins cold pages in memory in
> sequential
> >> >> order
> >> >> > > thus
> >> >> > > > > > > > > eliminating
> >> >> > > > > > > > > > > > random disk access. Speedup was like x100
> >> magnitude.
> >> >> > > > > > > > > > > >
> >> >> > > > > > > > > > > > I suggest adding the improvement to the
> product's
> >> >> core
> >> >> > > by
> >> >> > > > > > always
> >> >> > > > > > > > > > > > sequentially preloading pages for all internal
> >> >> > partition
> >> >> > > > > > > iterations
> >> >> > > > > > > > > > > (cache
> >> >> > > > > > > > > > > > iterators, scan queries, sql queries with scan
> >> plan)
> >> >> if
> >> >> > > > > > partition
> >> >> > > > > > > > is
> >> >> > > > > > > > > > cold
> >> >> > > > > > > > > > > > (low number of pinned pages).
> >> >> > > > > > > > > > > >
> >> >> > > > > > > > > > > > This also should speed up rebalancing from cold
> >> >> > > partitions.
> >> >> > > > > > > > > > > >
> >> >> > > > > > > > > > > > Ignite JIRA ticket [1]
> >> >> > > > > > > > > > > >
> >> >> > > > > > > > > > > > Thoughts ?
> >> >> > > > > > > > > > > >
> >> >> > > > > > > > > > > > [1]
> >> >>  https://issues.apache.org/jira/browse/IGNITE-8873
> >> >> > > > > > > > > > > >
> >> >> > > > > > > > > > > > --
> >> >> > > > > > > > > > > >
> >> >> > > > > > > > > > > > Best regards,
> >> >> > > > > > > > > > > > Alexei Scherbakov
> >> >> > > > > > > > > > > >
> >> >> > > > > > > > > > >
> >> >> > > > > > > > > >
> >> >> > > > > > > > > --
> >> >> > > > > > > > > --
> >> >> > > > > > > > > Maxim Muzafarov
> >> >> > > > > > > > >
> >> >> > > > > > > >
> >> >> > > > > > >
> >> >> > > > > >
> >> >> > > > >
> >> >> > > >
> >> >> > > >
> >> >> > > > --
> >> >> > > >
> >> >> > > > Best regards,
> >> >> > > > Alexei Scherbakov
> >> >> > > >
> >> >> > >
> >> >> >
> >> >>
> >>
> >>
> >> --
> >> Zhenya Stanilovsky
> >>
>
>
> --
> Zhenya Stanilovsky
>
-- 
--
Maxim Muzafarov

Re[4]: Cache scan efficiency

***UNCHECKED*** Re: Re[4]: Cache scan efficiency

UNCHECKED Re: Re[4]: Cache scan efficiency