You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ignite.apache.org by Denis Magda <dm...@apache.org> on 2019/09/13 16:42:57 UTC

Re: How to free up space on disc after removing entries from IgniteCache with enabled PDS?

The issue starts hitting others who deploy Ignite persistence in production:
https://issues.apache.org/jira/browse/IGNITE-12152

Alex, I'm curious is this a fundamental problem. Asked the same question in
JIRA but, probably, this discussion is a better place to get to the bottom
first:
https://issues.apache.org/jira/browse/IGNITE-10862

-
Denis


On Thu, Jan 10, 2019 at 6:01 AM Anton Vinogradov <av...@apache.org> wrote:

> Dmitriy,
>
> This does not look like a production-ready case :)
>
> How about
> 1) Once you need to write an entry - you have to chose not random "page
> from free-list with enough space"
> but "page from free-list with enough space closest to the beginning of the
> file".
>
> 2) Once you remove entry you have to merge the rest of the entries at this
> page to the
> "page from free-list with enough space closest to the beginning of the
> file"
> if possible. (optional)
>
> 3) Partition file tail with empty pages can bу removed at any time.
>
> 4) In case you have cold data inside the tail, just lock the page and
> perform migration to
> "page from free-list with enough space closest to the beginning of the
> file".
> This operation can be scheduled.
>
> On Wed, Jan 9, 2019 at 4:43 PM Dmitriy Pavlov <dp...@apache.org> wrote:
>
> > In the TC Bot, I used to create the second cache with CacheV2 name and
> > migrate needed data from Cache  V1 to V2.
> >
> > After CacheV1 destroy(), files are removed and disk space is freed.
> >
> > ср, 9 янв. 2019 г. в 12:04, Павлухин Иван <vo...@gmail.com>:
> >
> > > Vyacheslav,
> > >
> > > Have you investigated how other vendors (Oracle, Postgres) tackle this
> > > problem?
> > >
> > > I have one wild idea. Could the problem be solved by stopping a node
> > > which need to be defragmented, clearing persistence files and
> > > restarting the node? After rebalance the node will receive all data
> > > back without fragmentation. I see a big downside -- sending data
> > > across the network. But perhaps we can play with affinity and start
> > > new node on the same host which will receive the same data, after that
> > > old node can be stopped. It looks more as kind of workaround but
> > > perhaps it can be turned into workable solution.
> > >
> > > ср, 9 янв. 2019 г. в 10:49, Vyacheslav Daradur <da...@gmail.com>:
> > > >
> > > > Yes, it's about Page Memory defragmentation.
> > > >
> > > > Pages in partitions files are stored sequentially, possible, it makes
> > > > sense to defragment pages first to avoid interpages gaps since we use
> > > > pages offset to manage them.
> > > >
> > > > I filled an issue [1], I hope we will be able to find resources to
> > > > solve the issue before 2.8 release.
> > > >
> > > > [1] https://issues.apache.org/jira/browse/IGNITE-10862
> > > >
> > > > On Sat, Dec 29, 2018 at 10:47 AM Павлухин Иван <vo...@gmail.com>
> > > wrote:
> > > > >
> > > > > I suppose it is about Ignite Page Memory pages defragmentation.
> > > > >
> > > > > We can get 100 allocated pages each of which becomes only e.g. 50%
> > > > > filled after removal some entries. But they will occupy a space for
> > > > > 100 pages on a hard drive.
> > > > >
> > > > > пт, 28 дек. 2018 г. в 20:45, Denis Magda <dm...@apache.org>:
> > > > > >
> > > > > > Shouldn't the OS care of defragmentation? What we need to do is
> to
> > > give a
> > > > > > way to remove stale data and "release" the allocated space
> somehow
> > > through
> > > > > > the tools, MBeans or API methods.
> > > > > >
> > > > > > --
> > > > > > Denis
> > > > > >
> > > > > >
> > > > > > On Fri, Dec 28, 2018 at 6:24 AM Vladimir Ozerov <
> > > vozerov@gridgain.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Vyacheslav,
> > > > > > >
> > > > > > > AFAIK this is not implemented. Shrinking/defragmentation is
> > > important
> > > > > > > optimization. Not only because it releases free space, but also
> > > because it
> > > > > > > decreases total number of pages. But is it not very easy to
> > > implement, as
> > > > > > > you have to both reshuffle data entries and index entries,
> > > maintaining
> > > > > > > consistency for concurrent reads and updates at the same time.
> Or
> > > > > > > alternatively we can think of offline defragmentation. It will
> be
> > > easier to
> > > > > > > implement and faster, but concurrent operations will be
> > prohibited.
> > > > > > >
> > > > > > > On Fri, Dec 28, 2018 at 4:08 PM Vyacheslav Daradur <
> > > daradurvs@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Igniters, we have faced with the following problem on one of
> > our
> > > > > > > > deployments.
> > > > > > > >
> > > > > > > > Let's imagine that we have used IgniteCache with enabled PDS
> > > during the
> > > > > > > > time:
> > > > > > > > - hardware disc space has been occupied during growing up of
> an
> > > amount
> > > > > > > > of data, e.g. 100Gb;
> > > > > > > > - then, we removed non-actual data, e.g 50Gb, which became
> > > useless for
> > > > > > > us;
> > > > > > > > - disc space stopped growing up with new data, but it was not
> > > > > > > > released, and still took 100Gb, instead of expected 50Gb;
> > > > > > > >
> > > > > > > > Another use case:
> > > > > > > > - a user extracts data from IgniteCache to store it in
> separate
> > > > > > > > IgniteCache or another store;
> > > > > > > > - disc still is occupied and the user is not able to store
> data
> > > in the
> > > > > > > > different cache at the same cluster because of disc
> limitation;
> > > > > > > >
> > > > > > > > How can we help the user to free up the disc space, if an
> > amount
> > > of
> > > > > > > > data in IgniteCache has been reduced many times and will not
> be
> > > > > > > > increased in the nearest future?
> > > > > > > >
> > > > > > > > AFAIK, we have mechanics of reusing memory pages, that allows
> > us
> > > to
> > > > > > > > use pages which have been allocated and stored removed data
> for
> > > > > > > > storing new data.
> > > > > > > > Are there any chances to shrink data and free up space on
> disc
> > > (with
> > > > > > > > defragmentation if possible)?
> > > > > > > >
> > > > > > > > --
> > > > > > > > Best Regards, Vyacheslav D.
> > > > > > > >
> > > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best regards,
> > > > > Ivan Pavlukhin
> > > >
> > > >
> > > >
> > > > --
> > > > Best Regards, Vyacheslav D.
> > >
> > >
> > >
> > > --
> > > Best regards,
> > > Ivan Pavlukhin
> > >
> >
>

Re: How to free up space on disc after removing entries from IgniteCache with enabled PDS?

Posted by Ilya Kasnacheev <il...@gmail.com>.
Hello!

I think that good robust approach is to start background thread which will
try to compact pages and remove unneeded ones. It should only be active
when system is reasonably idle, or if there's severe fragmentation problem.

However, I am aware that implementing such heurestical cleaner is a
challenging task.

Regards,
-- 
Ilya Kasnacheev


пт, 4 окт. 2019 г. в 19:38, Alexey Goncharuk <al...@gmail.com>:

> Maxim,
>
> Having a cluster-wide lock for a cache does not improve availability of the
> solution. A user cannot defragment a cache if the cache is involved in a
> mission-critical operation, so having a lock on such a cache is equivalent
> to the whole cluster shutdown.
>
> We should decide between either a single offline node or a more complex
> fully online solution.
>
> пт, 4 окт. 2019 г. в 11:55, Maxim Muzafarov <mm...@apache.org>:
>
> > Igniters,
> >
> > This thread seems to be endless, but we if some kind of cache group
> > distributed write lock (exclusive for some of the internal Ignite
> > process) will be introduced? I think it will help to solve a batch of
> > problems, like:
> >
> > 1. defragmentation of all cache group partitions on the local node
> > without concurrent updates.
> > 2. improve data loading with data streamer isolation mode [1]. It
> > seems we should not allow concurrent updates to cache if we on `fast
> > data load` step.
> > 3. recovery from a snapshot without cache stop\start actions
> >
> >
> > [1] https://issues.apache.org/jira/browse/IGNITE-11793
> >
> > On Thu, 3 Oct 2019 at 22:50, Sergey Kozlov <sk...@gridgain.com> wrote:
> > >
> > > Hi
> > >
> > > I'm not sure that node offline is a best way to do that.
> > > Cons:
> > >  - different caches may have different defragmentation but we force to
> > stop
> > > whole node
> > >  - offline node is a maintenance operation will require to add +1
> backup
> > to
> > > reduce the risk of data loss
> > >  - baseline auto adjustment?
> > >  - impact to index rebuild?
> > >  - cache configuration changes (or destroy) during node offline
> > >
> > > What about other ways without node stop? E.g. make cache group on a
> node
> > > offline? Add *defrag <cache_group> *command to control.sh to force
> start
> > > rebalance internally in the node with expected impact to performance.
> > >
> > >
> > >
> > > On Thu, Oct 3, 2019 at 12:08 PM Anton Vinogradov <av...@apache.org>
> wrote:
> > >
> > > > Alexey,
> > > > As for me, it does not matter will it be IEP, umbrella or a single
> > issue.
> > > > The most important thing is Assignee :)
> > > >
> > > > On Thu, Oct 3, 2019 at 11:59 AM Alexey Goncharuk <
> > > > alexey.goncharuk@gmail.com>
> > > > wrote:
> > > >
> > > > > Anton, do you think we should file a single ticket for this or
> > should we
> > > > go
> > > > > with an IEP? As of now, the change does not look big enough for an
> > IEP
> > > > for
> > > > > me.
> > > > >
> > > > > чт, 3 окт. 2019 г. в 11:18, Anton Vinogradov <av...@apache.org>:
> > > > >
> > > > > > Alexey,
> > > > > >
> > > > > > Sounds good to me.
> > > > > >
> > > > > > On Thu, Oct 3, 2019 at 10:51 AM Alexey Goncharuk <
> > > > > > alexey.goncharuk@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Anton,
> > > > > > >
> > > > > > > Switching a partition to and from the SHRINKING state will
> > require
> > > > > > > intricate synchronizations in order to properly determine the
> > start
> > > > > > > position for historical rebalance without PME.
> > > > > > >
> > > > > > > I would still go with an offline-node approach, but instead of
> > > > cleaning
> > > > > > the
> > > > > > > persistence, we can do effective defragmentation when the node
> is
> > > > > offline
> > > > > > > because we are sure that there is no concurrent load. After the
> > > > > > > defragmentation completes, we bring the node back to the
> cluster
> > and
> > > > > > > historical rebalance will kick in automatically. It will still
> > > > require
> > > > > > > manual node restarts, but since the data is not removed, there
> > are no
> > > > > > > additional risks. Also, this will be an excellent solution for
> > those
> > > > > who
> > > > > > > can afford downtime and execute the defragment command on all
> > nodes
> > > > in
> > > > > > the
> > > > > > > cluster simultaneously - this will be the fastest way possible.
> > > > > > >
> > > > > > > --AG
> > > > > > >
> > > > > > > пн, 30 сент. 2019 г. в 09:29, Anton Vinogradov <av@apache.org
> >:
> > > > > > >
> > > > > > > > Alexei,
> > > > > > > > >> stopping fragmented node and removing partition data, then
> > > > > starting
> > > > > > it
> > > > > > > > again
> > > > > > > >
> > > > > > > > That's exactly what we're doing to solve the fragmentation
> > issue.
> > > > > > > > The problem here is that we have to perform N/B
> > restart-rebalance
> > > > > > > > operations (N - cluster size, B - backups count) and it takes
> > a lot
> > > > > of
> > > > > > > time
> > > > > > > > with risks to lose the data.
> > > > > > > >
> > > > > > > > On Fri, Sep 27, 2019 at 5:49 PM Alexei Scherbakov <
> > > > > > > > alexey.scherbakoff@gmail.com> wrote:
> > > > > > > >
> > > > > > > > > Probably this should be allowed to do using public API,
> > actually
> > > > > this
> > > > > > > is
> > > > > > > > > same as manual rebalancing.
> > > > > > > > >
> > > > > > > > > пт, 27 сент. 2019 г. в 17:40, Alexei Scherbakov <
> > > > > > > > > alexey.scherbakoff@gmail.com>:
> > > > > > > > >
> > > > > > > > > > The poor man's solution for the problem would be stopping
> > > > > > fragmented
> > > > > > > > node
> > > > > > > > > > and removing partition data, then starting it again
> > allowing
> > > > full
> > > > > > > state
> > > > > > > > > > transfer already without deletes.
> > > > > > > > > > Rinse and repeat for all owners.
> > > > > > > > > >
> > > > > > > > > > Anton Vinogradov, would this work for you as workaround ?
> > > > > > > > > >
> > > > > > > > > > чт, 19 сент. 2019 г. в 13:03, Anton Vinogradov <
> > av@apache.org
> > > > >:
> > > > > > > > > >
> > > > > > > > > >> Alexey,
> > > > > > > > > >>
> > > > > > > > > >> Let's combine your and Ivan's proposals.
> > > > > > > > > >>
> > > > > > > > > >> >> vacuum command, which acquires exclusive table lock,
> > so no
> > > > > > > > concurrent
> > > > > > > > > >> activities on the table are possible.
> > > > > > > > > >> and
> > > > > > > > > >> >> Could the problem be solved by stopping a node which
> > needs
> > > > to
> > > > > > be
> > > > > > > > > >> defragmented, clearing persistence files and restarting
> > the
> > > > > node?
> > > > > > > > > >> >> After rebalancing the node will receive all data back
> > > > without
> > > > > > > > > >> fragmentation.
> > > > > > > > > >>
> > > > > > > > > >> How about to have special partition state SHRINKING?
> > > > > > > > > >> This state should mean that partition unavailable for
> > reads
> > > > and
> > > > > > > > updates
> > > > > > > > > >> but
> > > > > > > > > >> should keep it's update-counters and should not be
> marked
> > as
> > > > > lost,
> > > > > > > > > renting
> > > > > > > > > >> or evicted.
> > > > > > > > > >> At this state we able to iterate over the partition and
> > apply
> > > > > it's
> > > > > > > > > entries
> > > > > > > > > >> to another file in a compact way.
> > > > > > > > > >> Indices should be updated during the copy-on-shrink
> > procedure
> > > > or
> > > > > > at
> > > > > > > > the
> > > > > > > > > >> shrink completion.
> > > > > > > > > >> Once shrank file is ready we should replace the original
> > > > > partition
> > > > > > > > file
> > > > > > > > > >> with it and mark it as MOVING which will start the
> > historical
> > > > > > > > rebalance.
> > > > > > > > > >> Shrinking should be performed during the low activity
> > periods,
> > > > > but
> > > > > > > > even
> > > > > > > > > in
> > > > > > > > > >> case we found that activity was high and historical
> > rebalance
> > > > is
> > > > > > not
> > > > > > > > > >> suitable we may just remove the file and use regular
> > rebalance
> > > > > to
> > > > > > > > > restore
> > > > > > > > > >> the partition (this will also lead to shrink).
> > > > > > > > > >>
> > > > > > > > > >> BTW, seems, we able to implement partition shrink in a
> > cheap
> > > > > way.
> > > > > > > > > >> We may just use rebalancing code to apply fat
> partition's
> > > > > entries
> > > > > > to
> > > > > > > > the
> > > > > > > > > >> new file.
> > > > > > > > > >> So, 3 stages here: local rebalance, indices update and
> > global
> > > > > > > > historical
> > > > > > > > > >> rebalance.
> > > > > > > > > >>
> > > > > > > > > >> On Thu, Sep 19, 2019 at 11:43 AM Alexey Goncharuk <
> > > > > > > > > >> alexey.goncharuk@gmail.com> wrote:
> > > > > > > > > >>
> > > > > > > > > >> > Anton,
> > > > > > > > > >> >
> > > > > > > > > >> >
> > > > > > > > > >> > > >>  The solution which Anton suggested does not look
> > easy
> > > > > > > because
> > > > > > > > it
> > > > > > > > > >> will
> > > > > > > > > >> > > most likely significantly hurt performance
> > > > > > > > > >> > > Mostly agree here, but what drop do we expect? What
> > price
> > > > do
> > > > > > we
> > > > > > > > > ready
> > > > > > > > > >> to
> > > > > > > > > >> > > pay?
> > > > > > > > > >> > > Not sure, but seems some vendors ready to pay, for
> > > > example,
> > > > > 5%
> > > > > > > > drop
> > > > > > > > > >> for
> > > > > > > > > >> > > this.
> > > > > > > > > >> >
> > > > > > > > > >> > 5% may be a big drop for some use-cases, so I think we
> > > > should
> > > > > > look
> > > > > > > > at
> > > > > > > > > >> how
> > > > > > > > > >> > to improve performance, not how to make it worse.
> > > > > > > > > >> >
> > > > > > > > > >> >
> > > > > > > > > >> > >
> > > > > > > > > >> > > >> it is hard to maintain a data structure to choose
> > "page
> > > > > > from
> > > > > > > > > >> free-list
> > > > > > > > > >> > > with enough space closest to the beginning of the
> > file".
> > > > > > > > > >> > > We can just split each free-list bucket to the
> couple
> > and
> > > > > use
> > > > > > > > first
> > > > > > > > > >> for
> > > > > > > > > >> > > pages in the first half of the file and the second
> > for the
> > > > > > last.
> > > > > > > > > >> > > Only two buckets required here since, during the
> file
> > > > > shrink,
> > > > > > > > first
> > > > > > > > > >> > > bucket's window will be shrank too.
> > > > > > > > > >> > > Seems, this give us the same price on put, just use
> > the
> > > > > first
> > > > > > > > bucket
> > > > > > > > > >> in
> > > > > > > > > >> > > case it's not empty.
> > > > > > > > > >> > > Remove price (with merge) will be increased, of
> > course.
> > > > > > > > > >> > >
> > > > > > > > > >> > > The compromise solution is to have priority put (to
> > the
> > > > > first
> > > > > > > path
> > > > > > > > > of
> > > > > > > > > >> the
> > > > > > > > > >> > > file), with keeping removal as is, and schedulable
> > > > per-page
> > > > > > > > > migration
> > > > > > > > > >> for
> > > > > > > > > >> > > the rest of the data during the low activity period.
> > > > > > > > > >> > >
> > > > > > > > > >> > Free lists are large and slow by themselves, it is
> > expensive
> > > > > to
> > > > > > > > > >> checkpoint
> > > > > > > > > >> > and read them on start, so as a long-term solution I
> > would
> > > > > look
> > > > > > > into
> > > > > > > > > >> > removing them. Moreover, not sure if adding yet
> another
> > > > > > background
> > > > > > > > > >> process
> > > > > > > > > >> > will improve the codebase reliability and simplicity.
> > > > > > > > > >> >
> > > > > > > > > >> > If we want to go the hard path, I would look at free
> > page
> > > > > > tracking
> > > > > > > > > >> bitmap -
> > > > > > > > > >> > a special bitmask page, where each page in an adjacent
> > block
> > > > > is
> > > > > > > > marked
> > > > > > > > > >> as 0
> > > > > > > > > >> > if it has free space more than a certain configurable
> > > > > threshold
> > > > > > > > (say,
> > > > > > > > > >> 80%)
> > > > > > > > > >> > - free, and 1 if less (full). Some vendors have
> > successfully
> > > > > > > > > implemented
> > > > > > > > > >> > this approach, which looks much more promising, but
> > harder
> > > > to
> > > > > > > > > implement.
> > > > > > > > > >> >
> > > > > > > > > >> > --AG
> > > > > > > > > >> >
> > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > >
> > > > > > > > > > Best regards,
> > > > > > > > > > Alexei Scherbakov
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > >
> > > > > > > > > Best regards,
> > > > > > > > > Alexei Scherbakov
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> > > --
> > > Sergey Kozlov
> > > GridGain Systems
> > > www.gridgain.com
> >
>

Re: How to free up space on disc after removing entries from IgniteCache with enabled PDS?

Posted by Sergey Kozlov <sk...@gridgain.com>.
Alexey

I'm ok for the suggested way in [1]

1. https://issues.apache.org/jira/browse/IGNITE-12263

On Tue, Oct 8, 2019 at 9:59 PM Denis Magda <dm...@apache.org> wrote:

> Anton,
>
> Seems like we have a name for the defragmentation mode with a downtime -
> Rolling Defrag )
>
> -
> Denis
>
>
> On Mon, Oct 7, 2019 at 11:04 PM Anton Vinogradov <av...@apache.org> wrote:
>
> > Denis,
> >
> > I like the idea that defragmentation is just an additional step on a node
> > (re)start like we perform PDS recovery now.
> > We may just use special key to specify node should defragment persistence
> > on (re)start.
> > Defragmentation can be the part of Rolling Upgrade in this case :)
> > It seems to be not a problem to restart nodes one-by-one, this will "eat"
> > only one backup guarantee.
> >
> > On Mon, Oct 7, 2019 at 8:28 PM Denis Magda <dm...@apache.org> wrote:
> >
> > > Alex, thanks for the summary and proposal. Anton, Ivan and others who
> > took
> > > part in this discussion, what're your thoughts? I see this
> > > rolling-upgrades-based approach as a reasonable solution. Even though a
> > > node shutdown is expected, the procedure doesn't lead to the cluster
> > outage
> > > meaning it can be utilized for 24x7 production environments.
> > >
> > > -
> > > Denis
> > >
> > >
> > > On Mon, Oct 7, 2019 at 1:35 AM Alexey Goncharuk <
> > > alexey.goncharuk@gmail.com>
> > > wrote:
> > >
> > > > Created a ticket for the first stage of this improvement. This can
> be a
> > > > first change towards the online mode suggested by Sergey and Anton.
> > > > https://issues.apache.org/jira/browse/IGNITE-12263
> > > >
> > > > пт, 4 окт. 2019 г. в 19:38, Alexey Goncharuk <
> > alexey.goncharuk@gmail.com
> > > >:
> > > >
> > > > > Maxim,
> > > > >
> > > > > Having a cluster-wide lock for a cache does not improve
> availability
> > of
> > > > > the solution. A user cannot defragment a cache if the cache is
> > involved
> > > > in
> > > > > a mission-critical operation, so having a lock on such a cache is
> > > > > equivalent to the whole cluster shutdown.
> > > > >
> > > > > We should decide between either a single offline node or a more
> > complex
> > > > > fully online solution.
> > > > >
> > > > > пт, 4 окт. 2019 г. в 11:55, Maxim Muzafarov <mm...@apache.org>:
> > > > >
> > > > >> Igniters,
> > > > >>
> > > > >> This thread seems to be endless, but we if some kind of cache
> group
> > > > >> distributed write lock (exclusive for some of the internal Ignite
> > > > >> process) will be introduced? I think it will help to solve a batch
> > of
> > > > >> problems, like:
> > > > >>
> > > > >> 1. defragmentation of all cache group partitions on the local node
> > > > >> without concurrent updates.
> > > > >> 2. improve data loading with data streamer isolation mode [1]. It
> > > > >> seems we should not allow concurrent updates to cache if we on
> `fast
> > > > >> data load` step.
> > > > >> 3. recovery from a snapshot without cache stop\start actions
> > > > >>
> > > > >>
> > > > >> [1] https://issues.apache.org/jira/browse/IGNITE-11793
> > > > >>
> > > > >> On Thu, 3 Oct 2019 at 22:50, Sergey Kozlov <sk...@gridgain.com>
> > > > wrote:
> > > > >> >
> > > > >> > Hi
> > > > >> >
> > > > >> > I'm not sure that node offline is a best way to do that.
> > > > >> > Cons:
> > > > >> >  - different caches may have different defragmentation but we
> > force
> > > to
> > > > >> stop
> > > > >> > whole node
> > > > >> >  - offline node is a maintenance operation will require to add
> +1
> > > > >> backup to
> > > > >> > reduce the risk of data loss
> > > > >> >  - baseline auto adjustment?
> > > > >> >  - impact to index rebuild?
> > > > >> >  - cache configuration changes (or destroy) during node offline
> > > > >> >
> > > > >> > What about other ways without node stop? E.g. make cache group
> on
> > a
> > > > node
> > > > >> > offline? Add *defrag <cache_group> *command to control.sh to
> force
> > > > start
> > > > >> > rebalance internally in the node with expected impact to
> > > performance.
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> > On Thu, Oct 3, 2019 at 12:08 PM Anton Vinogradov <av@apache.org
> >
> > > > wrote:
> > > > >> >
> > > > >> > > Alexey,
> > > > >> > > As for me, it does not matter will it be IEP, umbrella or a
> > single
> > > > >> issue.
> > > > >> > > The most important thing is Assignee :)
> > > > >> > >
> > > > >> > > On Thu, Oct 3, 2019 at 11:59 AM Alexey Goncharuk <
> > > > >> > > alexey.goncharuk@gmail.com>
> > > > >> > > wrote:
> > > > >> > >
> > > > >> > > > Anton, do you think we should file a single ticket for this
> or
> > > > >> should we
> > > > >> > > go
> > > > >> > > > with an IEP? As of now, the change does not look big enough
> > for
> > > an
> > > > >> IEP
> > > > >> > > for
> > > > >> > > > me.
> > > > >> > > >
> > > > >> > > > чт, 3 окт. 2019 г. в 11:18, Anton Vinogradov <av@apache.org
> >:
> > > > >> > > >
> > > > >> > > > > Alexey,
> > > > >> > > > >
> > > > >> > > > > Sounds good to me.
> > > > >> > > > >
> > > > >> > > > > On Thu, Oct 3, 2019 at 10:51 AM Alexey Goncharuk <
> > > > >> > > > > alexey.goncharuk@gmail.com>
> > > > >> > > > > wrote:
> > > > >> > > > >
> > > > >> > > > > > Anton,
> > > > >> > > > > >
> > > > >> > > > > > Switching a partition to and from the SHRINKING state
> will
> > > > >> require
> > > > >> > > > > > intricate synchronizations in order to properly
> determine
> > > the
> > > > >> start
> > > > >> > > > > > position for historical rebalance without PME.
> > > > >> > > > > >
> > > > >> > > > > > I would still go with an offline-node approach, but
> > instead
> > > of
> > > > >> > > cleaning
> > > > >> > > > > the
> > > > >> > > > > > persistence, we can do effective defragmentation when
> the
> > > node
> > > > >> is
> > > > >> > > > offline
> > > > >> > > > > > because we are sure that there is no concurrent load.
> > After
> > > > the
> > > > >> > > > > > defragmentation completes, we bring the node back to the
> > > > >> cluster and
> > > > >> > > > > > historical rebalance will kick in automatically. It will
> > > still
> > > > >> > > require
> > > > >> > > > > > manual node restarts, but since the data is not removed,
> > > there
> > > > >> are no
> > > > >> > > > > > additional risks. Also, this will be an excellent
> solution
> > > for
> > > > >> those
> > > > >> > > > who
> > > > >> > > > > > can afford downtime and execute the defragment command
> on
> > > all
> > > > >> nodes
> > > > >> > > in
> > > > >> > > > > the
> > > > >> > > > > > cluster simultaneously - this will be the fastest way
> > > > possible.
> > > > >> > > > > >
> > > > >> > > > > > --AG
> > > > >> > > > > >
> > > > >> > > > > > пн, 30 сент. 2019 г. в 09:29, Anton Vinogradov <
> > > av@apache.org
> > > > >:
> > > > >> > > > > >
> > > > >> > > > > > > Alexei,
> > > > >> > > > > > > >> stopping fragmented node and removing partition
> data,
> > > > then
> > > > >> > > > starting
> > > > >> > > > > it
> > > > >> > > > > > > again
> > > > >> > > > > > >
> > > > >> > > > > > > That's exactly what we're doing to solve the
> > fragmentation
> > > > >> issue.
> > > > >> > > > > > > The problem here is that we have to perform N/B
> > > > >> restart-rebalance
> > > > >> > > > > > > operations (N - cluster size, B - backups count) and
> it
> > > > takes
> > > > >> a lot
> > > > >> > > > of
> > > > >> > > > > > time
> > > > >> > > > > > > with risks to lose the data.
> > > > >> > > > > > >
> > > > >> > > > > > > On Fri, Sep 27, 2019 at 5:49 PM Alexei Scherbakov <
> > > > >> > > > > > > alexey.scherbakoff@gmail.com> wrote:
> > > > >> > > > > > >
> > > > >> > > > > > > > Probably this should be allowed to do using public
> > API,
> > > > >> actually
> > > > >> > > > this
> > > > >> > > > > > is
> > > > >> > > > > > > > same as manual rebalancing.
> > > > >> > > > > > > >
> > > > >> > > > > > > > пт, 27 сент. 2019 г. в 17:40, Alexei Scherbakov <
> > > > >> > > > > > > > alexey.scherbakoff@gmail.com>:
> > > > >> > > > > > > >
> > > > >> > > > > > > > > The poor man's solution for the problem would be
> > > > stopping
> > > > >> > > > > fragmented
> > > > >> > > > > > > node
> > > > >> > > > > > > > > and removing partition data, then starting it
> again
> > > > >> allowing
> > > > >> > > full
> > > > >> > > > > > state
> > > > >> > > > > > > > > transfer already without deletes.
> > > > >> > > > > > > > > Rinse and repeat for all owners.
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > Anton Vinogradov, would this work for you as
> > > workaround
> > > > ?
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > чт, 19 сент. 2019 г. в 13:03, Anton Vinogradov <
> > > > >> av@apache.org
> > > > >> > > >:
> > > > >> > > > > > > > >
> > > > >> > > > > > > > >> Alexey,
> > > > >> > > > > > > > >>
> > > > >> > > > > > > > >> Let's combine your and Ivan's proposals.
> > > > >> > > > > > > > >>
> > > > >> > > > > > > > >> >> vacuum command, which acquires exclusive table
> > > lock,
> > > > >> so no
> > > > >> > > > > > > concurrent
> > > > >> > > > > > > > >> activities on the table are possible.
> > > > >> > > > > > > > >> and
> > > > >> > > > > > > > >> >> Could the problem be solved by stopping a node
> > > which
> > > > >> needs
> > > > >> > > to
> > > > >> > > > > be
> > > > >> > > > > > > > >> defragmented, clearing persistence files and
> > > restarting
> > > > >> the
> > > > >> > > > node?
> > > > >> > > > > > > > >> >> After rebalancing the node will receive all
> data
> > > > back
> > > > >> > > without
> > > > >> > > > > > > > >> fragmentation.
> > > > >> > > > > > > > >>
> > > > >> > > > > > > > >> How about to have special partition state
> > SHRINKING?
> > > > >> > > > > > > > >> This state should mean that partition unavailable
> > for
> > > > >> reads
> > > > >> > > and
> > > > >> > > > > > > updates
> > > > >> > > > > > > > >> but
> > > > >> > > > > > > > >> should keep it's update-counters and should not
> be
> > > > >> marked as
> > > > >> > > > lost,
> > > > >> > > > > > > > renting
> > > > >> > > > > > > > >> or evicted.
> > > > >> > > > > > > > >> At this state we able to iterate over the
> partition
> > > and
> > > > >> apply
> > > > >> > > > it's
> > > > >> > > > > > > > entries
> > > > >> > > > > > > > >> to another file in a compact way.
> > > > >> > > > > > > > >> Indices should be updated during the
> copy-on-shrink
> > > > >> procedure
> > > > >> > > or
> > > > >> > > > > at
> > > > >> > > > > > > the
> > > > >> > > > > > > > >> shrink completion.
> > > > >> > > > > > > > >> Once shrank file is ready we should replace the
> > > > original
> > > > >> > > > partition
> > > > >> > > > > > > file
> > > > >> > > > > > > > >> with it and mark it as MOVING which will start
> the
> > > > >> historical
> > > > >> > > > > > > rebalance.
> > > > >> > > > > > > > >> Shrinking should be performed during the low
> > activity
> > > > >> periods,
> > > > >> > > > but
> > > > >> > > > > > > even
> > > > >> > > > > > > > in
> > > > >> > > > > > > > >> case we found that activity was high and
> historical
> > > > >> rebalance
> > > > >> > > is
> > > > >> > > > > not
> > > > >> > > > > > > > >> suitable we may just remove the file and use
> > regular
> > > > >> rebalance
> > > > >> > > > to
> > > > >> > > > > > > > restore
> > > > >> > > > > > > > >> the partition (this will also lead to shrink).
> > > > >> > > > > > > > >>
> > > > >> > > > > > > > >> BTW, seems, we able to implement partition shrink
> > in
> > > a
> > > > >> cheap
> > > > >> > > > way.
> > > > >> > > > > > > > >> We may just use rebalancing code to apply fat
> > > > partition's
> > > > >> > > > entries
> > > > >> > > > > to
> > > > >> > > > > > > the
> > > > >> > > > > > > > >> new file.
> > > > >> > > > > > > > >> So, 3 stages here: local rebalance, indices
> update
> > > and
> > > > >> global
> > > > >> > > > > > > historical
> > > > >> > > > > > > > >> rebalance.
> > > > >> > > > > > > > >>
> > > > >> > > > > > > > >> On Thu, Sep 19, 2019 at 11:43 AM Alexey
> Goncharuk <
> > > > >> > > > > > > > >> alexey.goncharuk@gmail.com> wrote:
> > > > >> > > > > > > > >>
> > > > >> > > > > > > > >> > Anton,
> > > > >> > > > > > > > >> >
> > > > >> > > > > > > > >> >
> > > > >> > > > > > > > >> > > >>  The solution which Anton suggested does
> not
> > > > look
> > > > >> easy
> > > > >> > > > > > because
> > > > >> > > > > > > it
> > > > >> > > > > > > > >> will
> > > > >> > > > > > > > >> > > most likely significantly hurt performance
> > > > >> > > > > > > > >> > > Mostly agree here, but what drop do we
> expect?
> > > What
> > > > >> price
> > > > >> > > do
> > > > >> > > > > we
> > > > >> > > > > > > > ready
> > > > >> > > > > > > > >> to
> > > > >> > > > > > > > >> > > pay?
> > > > >> > > > > > > > >> > > Not sure, but seems some vendors ready to
> pay,
> > > for
> > > > >> > > example,
> > > > >> > > > 5%
> > > > >> > > > > > > drop
> > > > >> > > > > > > > >> for
> > > > >> > > > > > > > >> > > this.
> > > > >> > > > > > > > >> >
> > > > >> > > > > > > > >> > 5% may be a big drop for some use-cases, so I
> > think
> > > > we
> > > > >> > > should
> > > > >> > > > > look
> > > > >> > > > > > > at
> > > > >> > > > > > > > >> how
> > > > >> > > > > > > > >> > to improve performance, not how to make it
> worse.
> > > > >> > > > > > > > >> >
> > > > >> > > > > > > > >> >
> > > > >> > > > > > > > >> > >
> > > > >> > > > > > > > >> > > >> it is hard to maintain a data structure to
> > > > choose
> > > > >> "page
> > > > >> > > > > from
> > > > >> > > > > > > > >> free-list
> > > > >> > > > > > > > >> > > with enough space closest to the beginning of
> > the
> > > > >> file".
> > > > >> > > > > > > > >> > > We can just split each free-list bucket to
> the
> > > > >> couple and
> > > > >> > > > use
> > > > >> > > > > > > first
> > > > >> > > > > > > > >> for
> > > > >> > > > > > > > >> > > pages in the first half of the file and the
> > > second
> > > > >> for the
> > > > >> > > > > last.
> > > > >> > > > > > > > >> > > Only two buckets required here since, during
> > the
> > > > file
> > > > >> > > > shrink,
> > > > >> > > > > > > first
> > > > >> > > > > > > > >> > > bucket's window will be shrank too.
> > > > >> > > > > > > > >> > > Seems, this give us the same price on put,
> just
> > > use
> > > > >> the
> > > > >> > > > first
> > > > >> > > > > > > bucket
> > > > >> > > > > > > > >> in
> > > > >> > > > > > > > >> > > case it's not empty.
> > > > >> > > > > > > > >> > > Remove price (with merge) will be increased,
> of
> > > > >> course.
> > > > >> > > > > > > > >> > >
> > > > >> > > > > > > > >> > > The compromise solution is to have priority
> put
> > > (to
> > > > >> the
> > > > >> > > > first
> > > > >> > > > > > path
> > > > >> > > > > > > > of
> > > > >> > > > > > > > >> the
> > > > >> > > > > > > > >> > > file), with keeping removal as is, and
> > > schedulable
> > > > >> > > per-page
> > > > >> > > > > > > > migration
> > > > >> > > > > > > > >> for
> > > > >> > > > > > > > >> > > the rest of the data during the low activity
> > > > period.
> > > > >> > > > > > > > >> > >
> > > > >> > > > > > > > >> > Free lists are large and slow by themselves, it
> > is
> > > > >> expensive
> > > > >> > > > to
> > > > >> > > > > > > > >> checkpoint
> > > > >> > > > > > > > >> > and read them on start, so as a long-term
> > solution
> > > I
> > > > >> would
> > > > >> > > > look
> > > > >> > > > > > into
> > > > >> > > > > > > > >> > removing them. Moreover, not sure if adding yet
> > > > another
> > > > >> > > > > background
> > > > >> > > > > > > > >> process
> > > > >> > > > > > > > >> > will improve the codebase reliability and
> > > simplicity.
> > > > >> > > > > > > > >> >
> > > > >> > > > > > > > >> > If we want to go the hard path, I would look at
> > > free
> > > > >> page
> > > > >> > > > > tracking
> > > > >> > > > > > > > >> bitmap -
> > > > >> > > > > > > > >> > a special bitmask page, where each page in an
> > > > adjacent
> > > > >> block
> > > > >> > > > is
> > > > >> > > > > > > marked
> > > > >> > > > > > > > >> as 0
> > > > >> > > > > > > > >> > if it has free space more than a certain
> > > configurable
> > > > >> > > > threshold
> > > > >> > > > > > > (say,
> > > > >> > > > > > > > >> 80%)
> > > > >> > > > > > > > >> > - free, and 1 if less (full). Some vendors have
> > > > >> successfully
> > > > >> > > > > > > > implemented
> > > > >> > > > > > > > >> > this approach, which looks much more promising,
> > but
> > > > >> harder
> > > > >> > > to
> > > > >> > > > > > > > implement.
> > > > >> > > > > > > > >> >
> > > > >> > > > > > > > >> > --AG
> > > > >> > > > > > > > >> >
> > > > >> > > > > > > > >>
> > > > >> > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > --
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > Best regards,
> > > > >> > > > > > > > > Alexei Scherbakov
> > > > >> > > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > > > --
> > > > >> > > > > > > >
> > > > >> > > > > > > > Best regards,
> > > > >> > > > > > > > Alexei Scherbakov
> > > > >> > > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >> >
> > > > >> > --
> > > > >> > Sergey Kozlov
> > > > >> > GridGain Systems
> > > > >> > www.gridgain.com
> > > > >>
> > > > >
> > > >
> > >
> >
>


-- 
Sergey Kozlov
GridGain Systems
www.gridgain.com

Re: How to free up space on disc after removing entries from IgniteCache with enabled PDS?

Posted by Denis Magda <dm...@apache.org>.
Anton,

Seems like we have a name for the defragmentation mode with a downtime -
Rolling Defrag )

-
Denis


On Mon, Oct 7, 2019 at 11:04 PM Anton Vinogradov <av...@apache.org> wrote:

> Denis,
>
> I like the idea that defragmentation is just an additional step on a node
> (re)start like we perform PDS recovery now.
> We may just use special key to specify node should defragment persistence
> on (re)start.
> Defragmentation can be the part of Rolling Upgrade in this case :)
> It seems to be not a problem to restart nodes one-by-one, this will "eat"
> only one backup guarantee.
>
> On Mon, Oct 7, 2019 at 8:28 PM Denis Magda <dm...@apache.org> wrote:
>
> > Alex, thanks for the summary and proposal. Anton, Ivan and others who
> took
> > part in this discussion, what're your thoughts? I see this
> > rolling-upgrades-based approach as a reasonable solution. Even though a
> > node shutdown is expected, the procedure doesn't lead to the cluster
> outage
> > meaning it can be utilized for 24x7 production environments.
> >
> > -
> > Denis
> >
> >
> > On Mon, Oct 7, 2019 at 1:35 AM Alexey Goncharuk <
> > alexey.goncharuk@gmail.com>
> > wrote:
> >
> > > Created a ticket for the first stage of this improvement. This can be a
> > > first change towards the online mode suggested by Sergey and Anton.
> > > https://issues.apache.org/jira/browse/IGNITE-12263
> > >
> > > пт, 4 окт. 2019 г. в 19:38, Alexey Goncharuk <
> alexey.goncharuk@gmail.com
> > >:
> > >
> > > > Maxim,
> > > >
> > > > Having a cluster-wide lock for a cache does not improve availability
> of
> > > > the solution. A user cannot defragment a cache if the cache is
> involved
> > > in
> > > > a mission-critical operation, so having a lock on such a cache is
> > > > equivalent to the whole cluster shutdown.
> > > >
> > > > We should decide between either a single offline node or a more
> complex
> > > > fully online solution.
> > > >
> > > > пт, 4 окт. 2019 г. в 11:55, Maxim Muzafarov <mm...@apache.org>:
> > > >
> > > >> Igniters,
> > > >>
> > > >> This thread seems to be endless, but we if some kind of cache group
> > > >> distributed write lock (exclusive for some of the internal Ignite
> > > >> process) will be introduced? I think it will help to solve a batch
> of
> > > >> problems, like:
> > > >>
> > > >> 1. defragmentation of all cache group partitions on the local node
> > > >> without concurrent updates.
> > > >> 2. improve data loading with data streamer isolation mode [1]. It
> > > >> seems we should not allow concurrent updates to cache if we on `fast
> > > >> data load` step.
> > > >> 3. recovery from a snapshot without cache stop\start actions
> > > >>
> > > >>
> > > >> [1] https://issues.apache.org/jira/browse/IGNITE-11793
> > > >>
> > > >> On Thu, 3 Oct 2019 at 22:50, Sergey Kozlov <sk...@gridgain.com>
> > > wrote:
> > > >> >
> > > >> > Hi
> > > >> >
> > > >> > I'm not sure that node offline is a best way to do that.
> > > >> > Cons:
> > > >> >  - different caches may have different defragmentation but we
> force
> > to
> > > >> stop
> > > >> > whole node
> > > >> >  - offline node is a maintenance operation will require to add +1
> > > >> backup to
> > > >> > reduce the risk of data loss
> > > >> >  - baseline auto adjustment?
> > > >> >  - impact to index rebuild?
> > > >> >  - cache configuration changes (or destroy) during node offline
> > > >> >
> > > >> > What about other ways without node stop? E.g. make cache group on
> a
> > > node
> > > >> > offline? Add *defrag <cache_group> *command to control.sh to force
> > > start
> > > >> > rebalance internally in the node with expected impact to
> > performance.
> > > >> >
> > > >> >
> > > >> >
> > > >> > On Thu, Oct 3, 2019 at 12:08 PM Anton Vinogradov <av...@apache.org>
> > > wrote:
> > > >> >
> > > >> > > Alexey,
> > > >> > > As for me, it does not matter will it be IEP, umbrella or a
> single
> > > >> issue.
> > > >> > > The most important thing is Assignee :)
> > > >> > >
> > > >> > > On Thu, Oct 3, 2019 at 11:59 AM Alexey Goncharuk <
> > > >> > > alexey.goncharuk@gmail.com>
> > > >> > > wrote:
> > > >> > >
> > > >> > > > Anton, do you think we should file a single ticket for this or
> > > >> should we
> > > >> > > go
> > > >> > > > with an IEP? As of now, the change does not look big enough
> for
> > an
> > > >> IEP
> > > >> > > for
> > > >> > > > me.
> > > >> > > >
> > > >> > > > чт, 3 окт. 2019 г. в 11:18, Anton Vinogradov <av...@apache.org>:
> > > >> > > >
> > > >> > > > > Alexey,
> > > >> > > > >
> > > >> > > > > Sounds good to me.
> > > >> > > > >
> > > >> > > > > On Thu, Oct 3, 2019 at 10:51 AM Alexey Goncharuk <
> > > >> > > > > alexey.goncharuk@gmail.com>
> > > >> > > > > wrote:
> > > >> > > > >
> > > >> > > > > > Anton,
> > > >> > > > > >
> > > >> > > > > > Switching a partition to and from the SHRINKING state will
> > > >> require
> > > >> > > > > > intricate synchronizations in order to properly determine
> > the
> > > >> start
> > > >> > > > > > position for historical rebalance without PME.
> > > >> > > > > >
> > > >> > > > > > I would still go with an offline-node approach, but
> instead
> > of
> > > >> > > cleaning
> > > >> > > > > the
> > > >> > > > > > persistence, we can do effective defragmentation when the
> > node
> > > >> is
> > > >> > > > offline
> > > >> > > > > > because we are sure that there is no concurrent load.
> After
> > > the
> > > >> > > > > > defragmentation completes, we bring the node back to the
> > > >> cluster and
> > > >> > > > > > historical rebalance will kick in automatically. It will
> > still
> > > >> > > require
> > > >> > > > > > manual node restarts, but since the data is not removed,
> > there
> > > >> are no
> > > >> > > > > > additional risks. Also, this will be an excellent solution
> > for
> > > >> those
> > > >> > > > who
> > > >> > > > > > can afford downtime and execute the defragment command on
> > all
> > > >> nodes
> > > >> > > in
> > > >> > > > > the
> > > >> > > > > > cluster simultaneously - this will be the fastest way
> > > possible.
> > > >> > > > > >
> > > >> > > > > > --AG
> > > >> > > > > >
> > > >> > > > > > пн, 30 сент. 2019 г. в 09:29, Anton Vinogradov <
> > av@apache.org
> > > >:
> > > >> > > > > >
> > > >> > > > > > > Alexei,
> > > >> > > > > > > >> stopping fragmented node and removing partition data,
> > > then
> > > >> > > > starting
> > > >> > > > > it
> > > >> > > > > > > again
> > > >> > > > > > >
> > > >> > > > > > > That's exactly what we're doing to solve the
> fragmentation
> > > >> issue.
> > > >> > > > > > > The problem here is that we have to perform N/B
> > > >> restart-rebalance
> > > >> > > > > > > operations (N - cluster size, B - backups count) and it
> > > takes
> > > >> a lot
> > > >> > > > of
> > > >> > > > > > time
> > > >> > > > > > > with risks to lose the data.
> > > >> > > > > > >
> > > >> > > > > > > On Fri, Sep 27, 2019 at 5:49 PM Alexei Scherbakov <
> > > >> > > > > > > alexey.scherbakoff@gmail.com> wrote:
> > > >> > > > > > >
> > > >> > > > > > > > Probably this should be allowed to do using public
> API,
> > > >> actually
> > > >> > > > this
> > > >> > > > > > is
> > > >> > > > > > > > same as manual rebalancing.
> > > >> > > > > > > >
> > > >> > > > > > > > пт, 27 сент. 2019 г. в 17:40, Alexei Scherbakov <
> > > >> > > > > > > > alexey.scherbakoff@gmail.com>:
> > > >> > > > > > > >
> > > >> > > > > > > > > The poor man's solution for the problem would be
> > > stopping
> > > >> > > > > fragmented
> > > >> > > > > > > node
> > > >> > > > > > > > > and removing partition data, then starting it again
> > > >> allowing
> > > >> > > full
> > > >> > > > > > state
> > > >> > > > > > > > > transfer already without deletes.
> > > >> > > > > > > > > Rinse and repeat for all owners.
> > > >> > > > > > > > >
> > > >> > > > > > > > > Anton Vinogradov, would this work for you as
> > workaround
> > > ?
> > > >> > > > > > > > >
> > > >> > > > > > > > > чт, 19 сент. 2019 г. в 13:03, Anton Vinogradov <
> > > >> av@apache.org
> > > >> > > >:
> > > >> > > > > > > > >
> > > >> > > > > > > > >> Alexey,
> > > >> > > > > > > > >>
> > > >> > > > > > > > >> Let's combine your and Ivan's proposals.
> > > >> > > > > > > > >>
> > > >> > > > > > > > >> >> vacuum command, which acquires exclusive table
> > lock,
> > > >> so no
> > > >> > > > > > > concurrent
> > > >> > > > > > > > >> activities on the table are possible.
> > > >> > > > > > > > >> and
> > > >> > > > > > > > >> >> Could the problem be solved by stopping a node
> > which
> > > >> needs
> > > >> > > to
> > > >> > > > > be
> > > >> > > > > > > > >> defragmented, clearing persistence files and
> > restarting
> > > >> the
> > > >> > > > node?
> > > >> > > > > > > > >> >> After rebalancing the node will receive all data
> > > back
> > > >> > > without
> > > >> > > > > > > > >> fragmentation.
> > > >> > > > > > > > >>
> > > >> > > > > > > > >> How about to have special partition state
> SHRINKING?
> > > >> > > > > > > > >> This state should mean that partition unavailable
> for
> > > >> reads
> > > >> > > and
> > > >> > > > > > > updates
> > > >> > > > > > > > >> but
> > > >> > > > > > > > >> should keep it's update-counters and should not be
> > > >> marked as
> > > >> > > > lost,
> > > >> > > > > > > > renting
> > > >> > > > > > > > >> or evicted.
> > > >> > > > > > > > >> At this state we able to iterate over the partition
> > and
> > > >> apply
> > > >> > > > it's
> > > >> > > > > > > > entries
> > > >> > > > > > > > >> to another file in a compact way.
> > > >> > > > > > > > >> Indices should be updated during the copy-on-shrink
> > > >> procedure
> > > >> > > or
> > > >> > > > > at
> > > >> > > > > > > the
> > > >> > > > > > > > >> shrink completion.
> > > >> > > > > > > > >> Once shrank file is ready we should replace the
> > > original
> > > >> > > > partition
> > > >> > > > > > > file
> > > >> > > > > > > > >> with it and mark it as MOVING which will start the
> > > >> historical
> > > >> > > > > > > rebalance.
> > > >> > > > > > > > >> Shrinking should be performed during the low
> activity
> > > >> periods,
> > > >> > > > but
> > > >> > > > > > > even
> > > >> > > > > > > > in
> > > >> > > > > > > > >> case we found that activity was high and historical
> > > >> rebalance
> > > >> > > is
> > > >> > > > > not
> > > >> > > > > > > > >> suitable we may just remove the file and use
> regular
> > > >> rebalance
> > > >> > > > to
> > > >> > > > > > > > restore
> > > >> > > > > > > > >> the partition (this will also lead to shrink).
> > > >> > > > > > > > >>
> > > >> > > > > > > > >> BTW, seems, we able to implement partition shrink
> in
> > a
> > > >> cheap
> > > >> > > > way.
> > > >> > > > > > > > >> We may just use rebalancing code to apply fat
> > > partition's
> > > >> > > > entries
> > > >> > > > > to
> > > >> > > > > > > the
> > > >> > > > > > > > >> new file.
> > > >> > > > > > > > >> So, 3 stages here: local rebalance, indices update
> > and
> > > >> global
> > > >> > > > > > > historical
> > > >> > > > > > > > >> rebalance.
> > > >> > > > > > > > >>
> > > >> > > > > > > > >> On Thu, Sep 19, 2019 at 11:43 AM Alexey Goncharuk <
> > > >> > > > > > > > >> alexey.goncharuk@gmail.com> wrote:
> > > >> > > > > > > > >>
> > > >> > > > > > > > >> > Anton,
> > > >> > > > > > > > >> >
> > > >> > > > > > > > >> >
> > > >> > > > > > > > >> > > >>  The solution which Anton suggested does not
> > > look
> > > >> easy
> > > >> > > > > > because
> > > >> > > > > > > it
> > > >> > > > > > > > >> will
> > > >> > > > > > > > >> > > most likely significantly hurt performance
> > > >> > > > > > > > >> > > Mostly agree here, but what drop do we expect?
> > What
> > > >> price
> > > >> > > do
> > > >> > > > > we
> > > >> > > > > > > > ready
> > > >> > > > > > > > >> to
> > > >> > > > > > > > >> > > pay?
> > > >> > > > > > > > >> > > Not sure, but seems some vendors ready to pay,
> > for
> > > >> > > example,
> > > >> > > > 5%
> > > >> > > > > > > drop
> > > >> > > > > > > > >> for
> > > >> > > > > > > > >> > > this.
> > > >> > > > > > > > >> >
> > > >> > > > > > > > >> > 5% may be a big drop for some use-cases, so I
> think
> > > we
> > > >> > > should
> > > >> > > > > look
> > > >> > > > > > > at
> > > >> > > > > > > > >> how
> > > >> > > > > > > > >> > to improve performance, not how to make it worse.
> > > >> > > > > > > > >> >
> > > >> > > > > > > > >> >
> > > >> > > > > > > > >> > >
> > > >> > > > > > > > >> > > >> it is hard to maintain a data structure to
> > > choose
> > > >> "page
> > > >> > > > > from
> > > >> > > > > > > > >> free-list
> > > >> > > > > > > > >> > > with enough space closest to the beginning of
> the
> > > >> file".
> > > >> > > > > > > > >> > > We can just split each free-list bucket to the
> > > >> couple and
> > > >> > > > use
> > > >> > > > > > > first
> > > >> > > > > > > > >> for
> > > >> > > > > > > > >> > > pages in the first half of the file and the
> > second
> > > >> for the
> > > >> > > > > last.
> > > >> > > > > > > > >> > > Only two buckets required here since, during
> the
> > > file
> > > >> > > > shrink,
> > > >> > > > > > > first
> > > >> > > > > > > > >> > > bucket's window will be shrank too.
> > > >> > > > > > > > >> > > Seems, this give us the same price on put, just
> > use
> > > >> the
> > > >> > > > first
> > > >> > > > > > > bucket
> > > >> > > > > > > > >> in
> > > >> > > > > > > > >> > > case it's not empty.
> > > >> > > > > > > > >> > > Remove price (with merge) will be increased, of
> > > >> course.
> > > >> > > > > > > > >> > >
> > > >> > > > > > > > >> > > The compromise solution is to have priority put
> > (to
> > > >> the
> > > >> > > > first
> > > >> > > > > > path
> > > >> > > > > > > > of
> > > >> > > > > > > > >> the
> > > >> > > > > > > > >> > > file), with keeping removal as is, and
> > schedulable
> > > >> > > per-page
> > > >> > > > > > > > migration
> > > >> > > > > > > > >> for
> > > >> > > > > > > > >> > > the rest of the data during the low activity
> > > period.
> > > >> > > > > > > > >> > >
> > > >> > > > > > > > >> > Free lists are large and slow by themselves, it
> is
> > > >> expensive
> > > >> > > > to
> > > >> > > > > > > > >> checkpoint
> > > >> > > > > > > > >> > and read them on start, so as a long-term
> solution
> > I
> > > >> would
> > > >> > > > look
> > > >> > > > > > into
> > > >> > > > > > > > >> > removing them. Moreover, not sure if adding yet
> > > another
> > > >> > > > > background
> > > >> > > > > > > > >> process
> > > >> > > > > > > > >> > will improve the codebase reliability and
> > simplicity.
> > > >> > > > > > > > >> >
> > > >> > > > > > > > >> > If we want to go the hard path, I would look at
> > free
> > > >> page
> > > >> > > > > tracking
> > > >> > > > > > > > >> bitmap -
> > > >> > > > > > > > >> > a special bitmask page, where each page in an
> > > adjacent
> > > >> block
> > > >> > > > is
> > > >> > > > > > > marked
> > > >> > > > > > > > >> as 0
> > > >> > > > > > > > >> > if it has free space more than a certain
> > configurable
> > > >> > > > threshold
> > > >> > > > > > > (say,
> > > >> > > > > > > > >> 80%)
> > > >> > > > > > > > >> > - free, and 1 if less (full). Some vendors have
> > > >> successfully
> > > >> > > > > > > > implemented
> > > >> > > > > > > > >> > this approach, which looks much more promising,
> but
> > > >> harder
> > > >> > > to
> > > >> > > > > > > > implement.
> > > >> > > > > > > > >> >
> > > >> > > > > > > > >> > --AG
> > > >> > > > > > > > >> >
> > > >> > > > > > > > >>
> > > >> > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > > > --
> > > >> > > > > > > > >
> > > >> > > > > > > > > Best regards,
> > > >> > > > > > > > > Alexei Scherbakov
> > > >> > > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > > > --
> > > >> > > > > > > >
> > > >> > > > > > > > Best regards,
> > > >> > > > > > > > Alexei Scherbakov
> > > >> > > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >> >
> > > >> > --
> > > >> > Sergey Kozlov
> > > >> > GridGain Systems
> > > >> > www.gridgain.com
> > > >>
> > > >
> > >
> >
>

Re: How to free up space on disc after removing entries from IgniteCache with enabled PDS?

Posted by Anton Vinogradov <av...@apache.org>.
Denis,

I like the idea that defragmentation is just an additional step on a node
(re)start like we perform PDS recovery now.
We may just use special key to specify node should defragment persistence
on (re)start.
Defragmentation can be the part of Rolling Upgrade in this case :)
It seems to be not a problem to restart nodes one-by-one, this will "eat"
only one backup guarantee.

On Mon, Oct 7, 2019 at 8:28 PM Denis Magda <dm...@apache.org> wrote:

> Alex, thanks for the summary and proposal. Anton, Ivan and others who took
> part in this discussion, what're your thoughts? I see this
> rolling-upgrades-based approach as a reasonable solution. Even though a
> node shutdown is expected, the procedure doesn't lead to the cluster outage
> meaning it can be utilized for 24x7 production environments.
>
> -
> Denis
>
>
> On Mon, Oct 7, 2019 at 1:35 AM Alexey Goncharuk <
> alexey.goncharuk@gmail.com>
> wrote:
>
> > Created a ticket for the first stage of this improvement. This can be a
> > first change towards the online mode suggested by Sergey and Anton.
> > https://issues.apache.org/jira/browse/IGNITE-12263
> >
> > пт, 4 окт. 2019 г. в 19:38, Alexey Goncharuk <alexey.goncharuk@gmail.com
> >:
> >
> > > Maxim,
> > >
> > > Having a cluster-wide lock for a cache does not improve availability of
> > > the solution. A user cannot defragment a cache if the cache is involved
> > in
> > > a mission-critical operation, so having a lock on such a cache is
> > > equivalent to the whole cluster shutdown.
> > >
> > > We should decide between either a single offline node or a more complex
> > > fully online solution.
> > >
> > > пт, 4 окт. 2019 г. в 11:55, Maxim Muzafarov <mm...@apache.org>:
> > >
> > >> Igniters,
> > >>
> > >> This thread seems to be endless, but we if some kind of cache group
> > >> distributed write lock (exclusive for some of the internal Ignite
> > >> process) will be introduced? I think it will help to solve a batch of
> > >> problems, like:
> > >>
> > >> 1. defragmentation of all cache group partitions on the local node
> > >> without concurrent updates.
> > >> 2. improve data loading with data streamer isolation mode [1]. It
> > >> seems we should not allow concurrent updates to cache if we on `fast
> > >> data load` step.
> > >> 3. recovery from a snapshot without cache stop\start actions
> > >>
> > >>
> > >> [1] https://issues.apache.org/jira/browse/IGNITE-11793
> > >>
> > >> On Thu, 3 Oct 2019 at 22:50, Sergey Kozlov <sk...@gridgain.com>
> > wrote:
> > >> >
> > >> > Hi
> > >> >
> > >> > I'm not sure that node offline is a best way to do that.
> > >> > Cons:
> > >> >  - different caches may have different defragmentation but we force
> to
> > >> stop
> > >> > whole node
> > >> >  - offline node is a maintenance operation will require to add +1
> > >> backup to
> > >> > reduce the risk of data loss
> > >> >  - baseline auto adjustment?
> > >> >  - impact to index rebuild?
> > >> >  - cache configuration changes (or destroy) during node offline
> > >> >
> > >> > What about other ways without node stop? E.g. make cache group on a
> > node
> > >> > offline? Add *defrag <cache_group> *command to control.sh to force
> > start
> > >> > rebalance internally in the node with expected impact to
> performance.
> > >> >
> > >> >
> > >> >
> > >> > On Thu, Oct 3, 2019 at 12:08 PM Anton Vinogradov <av...@apache.org>
> > wrote:
> > >> >
> > >> > > Alexey,
> > >> > > As for me, it does not matter will it be IEP, umbrella or a single
> > >> issue.
> > >> > > The most important thing is Assignee :)
> > >> > >
> > >> > > On Thu, Oct 3, 2019 at 11:59 AM Alexey Goncharuk <
> > >> > > alexey.goncharuk@gmail.com>
> > >> > > wrote:
> > >> > >
> > >> > > > Anton, do you think we should file a single ticket for this or
> > >> should we
> > >> > > go
> > >> > > > with an IEP? As of now, the change does not look big enough for
> an
> > >> IEP
> > >> > > for
> > >> > > > me.
> > >> > > >
> > >> > > > чт, 3 окт. 2019 г. в 11:18, Anton Vinogradov <av...@apache.org>:
> > >> > > >
> > >> > > > > Alexey,
> > >> > > > >
> > >> > > > > Sounds good to me.
> > >> > > > >
> > >> > > > > On Thu, Oct 3, 2019 at 10:51 AM Alexey Goncharuk <
> > >> > > > > alexey.goncharuk@gmail.com>
> > >> > > > > wrote:
> > >> > > > >
> > >> > > > > > Anton,
> > >> > > > > >
> > >> > > > > > Switching a partition to and from the SHRINKING state will
> > >> require
> > >> > > > > > intricate synchronizations in order to properly determine
> the
> > >> start
> > >> > > > > > position for historical rebalance without PME.
> > >> > > > > >
> > >> > > > > > I would still go with an offline-node approach, but instead
> of
> > >> > > cleaning
> > >> > > > > the
> > >> > > > > > persistence, we can do effective defragmentation when the
> node
> > >> is
> > >> > > > offline
> > >> > > > > > because we are sure that there is no concurrent load. After
> > the
> > >> > > > > > defragmentation completes, we bring the node back to the
> > >> cluster and
> > >> > > > > > historical rebalance will kick in automatically. It will
> still
> > >> > > require
> > >> > > > > > manual node restarts, but since the data is not removed,
> there
> > >> are no
> > >> > > > > > additional risks. Also, this will be an excellent solution
> for
> > >> those
> > >> > > > who
> > >> > > > > > can afford downtime and execute the defragment command on
> all
> > >> nodes
> > >> > > in
> > >> > > > > the
> > >> > > > > > cluster simultaneously - this will be the fastest way
> > possible.
> > >> > > > > >
> > >> > > > > > --AG
> > >> > > > > >
> > >> > > > > > пн, 30 сент. 2019 г. в 09:29, Anton Vinogradov <
> av@apache.org
> > >:
> > >> > > > > >
> > >> > > > > > > Alexei,
> > >> > > > > > > >> stopping fragmented node and removing partition data,
> > then
> > >> > > > starting
> > >> > > > > it
> > >> > > > > > > again
> > >> > > > > > >
> > >> > > > > > > That's exactly what we're doing to solve the fragmentation
> > >> issue.
> > >> > > > > > > The problem here is that we have to perform N/B
> > >> restart-rebalance
> > >> > > > > > > operations (N - cluster size, B - backups count) and it
> > takes
> > >> a lot
> > >> > > > of
> > >> > > > > > time
> > >> > > > > > > with risks to lose the data.
> > >> > > > > > >
> > >> > > > > > > On Fri, Sep 27, 2019 at 5:49 PM Alexei Scherbakov <
> > >> > > > > > > alexey.scherbakoff@gmail.com> wrote:
> > >> > > > > > >
> > >> > > > > > > > Probably this should be allowed to do using public API,
> > >> actually
> > >> > > > this
> > >> > > > > > is
> > >> > > > > > > > same as manual rebalancing.
> > >> > > > > > > >
> > >> > > > > > > > пт, 27 сент. 2019 г. в 17:40, Alexei Scherbakov <
> > >> > > > > > > > alexey.scherbakoff@gmail.com>:
> > >> > > > > > > >
> > >> > > > > > > > > The poor man's solution for the problem would be
> > stopping
> > >> > > > > fragmented
> > >> > > > > > > node
> > >> > > > > > > > > and removing partition data, then starting it again
> > >> allowing
> > >> > > full
> > >> > > > > > state
> > >> > > > > > > > > transfer already without deletes.
> > >> > > > > > > > > Rinse and repeat for all owners.
> > >> > > > > > > > >
> > >> > > > > > > > > Anton Vinogradov, would this work for you as
> workaround
> > ?
> > >> > > > > > > > >
> > >> > > > > > > > > чт, 19 сент. 2019 г. в 13:03, Anton Vinogradov <
> > >> av@apache.org
> > >> > > >:
> > >> > > > > > > > >
> > >> > > > > > > > >> Alexey,
> > >> > > > > > > > >>
> > >> > > > > > > > >> Let's combine your and Ivan's proposals.
> > >> > > > > > > > >>
> > >> > > > > > > > >> >> vacuum command, which acquires exclusive table
> lock,
> > >> so no
> > >> > > > > > > concurrent
> > >> > > > > > > > >> activities on the table are possible.
> > >> > > > > > > > >> and
> > >> > > > > > > > >> >> Could the problem be solved by stopping a node
> which
> > >> needs
> > >> > > to
> > >> > > > > be
> > >> > > > > > > > >> defragmented, clearing persistence files and
> restarting
> > >> the
> > >> > > > node?
> > >> > > > > > > > >> >> After rebalancing the node will receive all data
> > back
> > >> > > without
> > >> > > > > > > > >> fragmentation.
> > >> > > > > > > > >>
> > >> > > > > > > > >> How about to have special partition state SHRINKING?
> > >> > > > > > > > >> This state should mean that partition unavailable for
> > >> reads
> > >> > > and
> > >> > > > > > > updates
> > >> > > > > > > > >> but
> > >> > > > > > > > >> should keep it's update-counters and should not be
> > >> marked as
> > >> > > > lost,
> > >> > > > > > > > renting
> > >> > > > > > > > >> or evicted.
> > >> > > > > > > > >> At this state we able to iterate over the partition
> and
> > >> apply
> > >> > > > it's
> > >> > > > > > > > entries
> > >> > > > > > > > >> to another file in a compact way.
> > >> > > > > > > > >> Indices should be updated during the copy-on-shrink
> > >> procedure
> > >> > > or
> > >> > > > > at
> > >> > > > > > > the
> > >> > > > > > > > >> shrink completion.
> > >> > > > > > > > >> Once shrank file is ready we should replace the
> > original
> > >> > > > partition
> > >> > > > > > > file
> > >> > > > > > > > >> with it and mark it as MOVING which will start the
> > >> historical
> > >> > > > > > > rebalance.
> > >> > > > > > > > >> Shrinking should be performed during the low activity
> > >> periods,
> > >> > > > but
> > >> > > > > > > even
> > >> > > > > > > > in
> > >> > > > > > > > >> case we found that activity was high and historical
> > >> rebalance
> > >> > > is
> > >> > > > > not
> > >> > > > > > > > >> suitable we may just remove the file and use regular
> > >> rebalance
> > >> > > > to
> > >> > > > > > > > restore
> > >> > > > > > > > >> the partition (this will also lead to shrink).
> > >> > > > > > > > >>
> > >> > > > > > > > >> BTW, seems, we able to implement partition shrink in
> a
> > >> cheap
> > >> > > > way.
> > >> > > > > > > > >> We may just use rebalancing code to apply fat
> > partition's
> > >> > > > entries
> > >> > > > > to
> > >> > > > > > > the
> > >> > > > > > > > >> new file.
> > >> > > > > > > > >> So, 3 stages here: local rebalance, indices update
> and
> > >> global
> > >> > > > > > > historical
> > >> > > > > > > > >> rebalance.
> > >> > > > > > > > >>
> > >> > > > > > > > >> On Thu, Sep 19, 2019 at 11:43 AM Alexey Goncharuk <
> > >> > > > > > > > >> alexey.goncharuk@gmail.com> wrote:
> > >> > > > > > > > >>
> > >> > > > > > > > >> > Anton,
> > >> > > > > > > > >> >
> > >> > > > > > > > >> >
> > >> > > > > > > > >> > > >>  The solution which Anton suggested does not
> > look
> > >> easy
> > >> > > > > > because
> > >> > > > > > > it
> > >> > > > > > > > >> will
> > >> > > > > > > > >> > > most likely significantly hurt performance
> > >> > > > > > > > >> > > Mostly agree here, but what drop do we expect?
> What
> > >> price
> > >> > > do
> > >> > > > > we
> > >> > > > > > > > ready
> > >> > > > > > > > >> to
> > >> > > > > > > > >> > > pay?
> > >> > > > > > > > >> > > Not sure, but seems some vendors ready to pay,
> for
> > >> > > example,
> > >> > > > 5%
> > >> > > > > > > drop
> > >> > > > > > > > >> for
> > >> > > > > > > > >> > > this.
> > >> > > > > > > > >> >
> > >> > > > > > > > >> > 5% may be a big drop for some use-cases, so I think
> > we
> > >> > > should
> > >> > > > > look
> > >> > > > > > > at
> > >> > > > > > > > >> how
> > >> > > > > > > > >> > to improve performance, not how to make it worse.
> > >> > > > > > > > >> >
> > >> > > > > > > > >> >
> > >> > > > > > > > >> > >
> > >> > > > > > > > >> > > >> it is hard to maintain a data structure to
> > choose
> > >> "page
> > >> > > > > from
> > >> > > > > > > > >> free-list
> > >> > > > > > > > >> > > with enough space closest to the beginning of the
> > >> file".
> > >> > > > > > > > >> > > We can just split each free-list bucket to the
> > >> couple and
> > >> > > > use
> > >> > > > > > > first
> > >> > > > > > > > >> for
> > >> > > > > > > > >> > > pages in the first half of the file and the
> second
> > >> for the
> > >> > > > > last.
> > >> > > > > > > > >> > > Only two buckets required here since, during the
> > file
> > >> > > > shrink,
> > >> > > > > > > first
> > >> > > > > > > > >> > > bucket's window will be shrank too.
> > >> > > > > > > > >> > > Seems, this give us the same price on put, just
> use
> > >> the
> > >> > > > first
> > >> > > > > > > bucket
> > >> > > > > > > > >> in
> > >> > > > > > > > >> > > case it's not empty.
> > >> > > > > > > > >> > > Remove price (with merge) will be increased, of
> > >> course.
> > >> > > > > > > > >> > >
> > >> > > > > > > > >> > > The compromise solution is to have priority put
> (to
> > >> the
> > >> > > > first
> > >> > > > > > path
> > >> > > > > > > > of
> > >> > > > > > > > >> the
> > >> > > > > > > > >> > > file), with keeping removal as is, and
> schedulable
> > >> > > per-page
> > >> > > > > > > > migration
> > >> > > > > > > > >> for
> > >> > > > > > > > >> > > the rest of the data during the low activity
> > period.
> > >> > > > > > > > >> > >
> > >> > > > > > > > >> > Free lists are large and slow by themselves, it is
> > >> expensive
> > >> > > > to
> > >> > > > > > > > >> checkpoint
> > >> > > > > > > > >> > and read them on start, so as a long-term solution
> I
> > >> would
> > >> > > > look
> > >> > > > > > into
> > >> > > > > > > > >> > removing them. Moreover, not sure if adding yet
> > another
> > >> > > > > background
> > >> > > > > > > > >> process
> > >> > > > > > > > >> > will improve the codebase reliability and
> simplicity.
> > >> > > > > > > > >> >
> > >> > > > > > > > >> > If we want to go the hard path, I would look at
> free
> > >> page
> > >> > > > > tracking
> > >> > > > > > > > >> bitmap -
> > >> > > > > > > > >> > a special bitmask page, where each page in an
> > adjacent
> > >> block
> > >> > > > is
> > >> > > > > > > marked
> > >> > > > > > > > >> as 0
> > >> > > > > > > > >> > if it has free space more than a certain
> configurable
> > >> > > > threshold
> > >> > > > > > > (say,
> > >> > > > > > > > >> 80%)
> > >> > > > > > > > >> > - free, and 1 if less (full). Some vendors have
> > >> successfully
> > >> > > > > > > > implemented
> > >> > > > > > > > >> > this approach, which looks much more promising, but
> > >> harder
> > >> > > to
> > >> > > > > > > > implement.
> > >> > > > > > > > >> >
> > >> > > > > > > > >> > --AG
> > >> > > > > > > > >> >
> > >> > > > > > > > >>
> > >> > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > > > --
> > >> > > > > > > > >
> > >> > > > > > > > > Best regards,
> > >> > > > > > > > > Alexei Scherbakov
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > > >
> > >> > > > > > > > --
> > >> > > > > > > >
> > >> > > > > > > > Best regards,
> > >> > > > > > > > Alexei Scherbakov
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >> >
> > >> > --
> > >> > Sergey Kozlov
> > >> > GridGain Systems
> > >> > www.gridgain.com
> > >>
> > >
> >
>

Re: How to free up space on disc after removing entries from IgniteCache with enabled PDS?

Posted by Denis Magda <dm...@apache.org>.
Alex, thanks for the summary and proposal. Anton, Ivan and others who took
part in this discussion, what're your thoughts? I see this
rolling-upgrades-based approach as a reasonable solution. Even though a
node shutdown is expected, the procedure doesn't lead to the cluster outage
meaning it can be utilized for 24x7 production environments.

-
Denis


On Mon, Oct 7, 2019 at 1:35 AM Alexey Goncharuk <al...@gmail.com>
wrote:

> Created a ticket for the first stage of this improvement. This can be a
> first change towards the online mode suggested by Sergey and Anton.
> https://issues.apache.org/jira/browse/IGNITE-12263
>
> пт, 4 окт. 2019 г. в 19:38, Alexey Goncharuk <al...@gmail.com>:
>
> > Maxim,
> >
> > Having a cluster-wide lock for a cache does not improve availability of
> > the solution. A user cannot defragment a cache if the cache is involved
> in
> > a mission-critical operation, so having a lock on such a cache is
> > equivalent to the whole cluster shutdown.
> >
> > We should decide between either a single offline node or a more complex
> > fully online solution.
> >
> > пт, 4 окт. 2019 г. в 11:55, Maxim Muzafarov <mm...@apache.org>:
> >
> >> Igniters,
> >>
> >> This thread seems to be endless, but we if some kind of cache group
> >> distributed write lock (exclusive for some of the internal Ignite
> >> process) will be introduced? I think it will help to solve a batch of
> >> problems, like:
> >>
> >> 1. defragmentation of all cache group partitions on the local node
> >> without concurrent updates.
> >> 2. improve data loading with data streamer isolation mode [1]. It
> >> seems we should not allow concurrent updates to cache if we on `fast
> >> data load` step.
> >> 3. recovery from a snapshot without cache stop\start actions
> >>
> >>
> >> [1] https://issues.apache.org/jira/browse/IGNITE-11793
> >>
> >> On Thu, 3 Oct 2019 at 22:50, Sergey Kozlov <sk...@gridgain.com>
> wrote:
> >> >
> >> > Hi
> >> >
> >> > I'm not sure that node offline is a best way to do that.
> >> > Cons:
> >> >  - different caches may have different defragmentation but we force to
> >> stop
> >> > whole node
> >> >  - offline node is a maintenance operation will require to add +1
> >> backup to
> >> > reduce the risk of data loss
> >> >  - baseline auto adjustment?
> >> >  - impact to index rebuild?
> >> >  - cache configuration changes (or destroy) during node offline
> >> >
> >> > What about other ways without node stop? E.g. make cache group on a
> node
> >> > offline? Add *defrag <cache_group> *command to control.sh to force
> start
> >> > rebalance internally in the node with expected impact to performance.
> >> >
> >> >
> >> >
> >> > On Thu, Oct 3, 2019 at 12:08 PM Anton Vinogradov <av...@apache.org>
> wrote:
> >> >
> >> > > Alexey,
> >> > > As for me, it does not matter will it be IEP, umbrella or a single
> >> issue.
> >> > > The most important thing is Assignee :)
> >> > >
> >> > > On Thu, Oct 3, 2019 at 11:59 AM Alexey Goncharuk <
> >> > > alexey.goncharuk@gmail.com>
> >> > > wrote:
> >> > >
> >> > > > Anton, do you think we should file a single ticket for this or
> >> should we
> >> > > go
> >> > > > with an IEP? As of now, the change does not look big enough for an
> >> IEP
> >> > > for
> >> > > > me.
> >> > > >
> >> > > > чт, 3 окт. 2019 г. в 11:18, Anton Vinogradov <av...@apache.org>:
> >> > > >
> >> > > > > Alexey,
> >> > > > >
> >> > > > > Sounds good to me.
> >> > > > >
> >> > > > > On Thu, Oct 3, 2019 at 10:51 AM Alexey Goncharuk <
> >> > > > > alexey.goncharuk@gmail.com>
> >> > > > > wrote:
> >> > > > >
> >> > > > > > Anton,
> >> > > > > >
> >> > > > > > Switching a partition to and from the SHRINKING state will
> >> require
> >> > > > > > intricate synchronizations in order to properly determine the
> >> start
> >> > > > > > position for historical rebalance without PME.
> >> > > > > >
> >> > > > > > I would still go with an offline-node approach, but instead of
> >> > > cleaning
> >> > > > > the
> >> > > > > > persistence, we can do effective defragmentation when the node
> >> is
> >> > > > offline
> >> > > > > > because we are sure that there is no concurrent load. After
> the
> >> > > > > > defragmentation completes, we bring the node back to the
> >> cluster and
> >> > > > > > historical rebalance will kick in automatically. It will still
> >> > > require
> >> > > > > > manual node restarts, but since the data is not removed, there
> >> are no
> >> > > > > > additional risks. Also, this will be an excellent solution for
> >> those
> >> > > > who
> >> > > > > > can afford downtime and execute the defragment command on all
> >> nodes
> >> > > in
> >> > > > > the
> >> > > > > > cluster simultaneously - this will be the fastest way
> possible.
> >> > > > > >
> >> > > > > > --AG
> >> > > > > >
> >> > > > > > пн, 30 сент. 2019 г. в 09:29, Anton Vinogradov <av@apache.org
> >:
> >> > > > > >
> >> > > > > > > Alexei,
> >> > > > > > > >> stopping fragmented node and removing partition data,
> then
> >> > > > starting
> >> > > > > it
> >> > > > > > > again
> >> > > > > > >
> >> > > > > > > That's exactly what we're doing to solve the fragmentation
> >> issue.
> >> > > > > > > The problem here is that we have to perform N/B
> >> restart-rebalance
> >> > > > > > > operations (N - cluster size, B - backups count) and it
> takes
> >> a lot
> >> > > > of
> >> > > > > > time
> >> > > > > > > with risks to lose the data.
> >> > > > > > >
> >> > > > > > > On Fri, Sep 27, 2019 at 5:49 PM Alexei Scherbakov <
> >> > > > > > > alexey.scherbakoff@gmail.com> wrote:
> >> > > > > > >
> >> > > > > > > > Probably this should be allowed to do using public API,
> >> actually
> >> > > > this
> >> > > > > > is
> >> > > > > > > > same as manual rebalancing.
> >> > > > > > > >
> >> > > > > > > > пт, 27 сент. 2019 г. в 17:40, Alexei Scherbakov <
> >> > > > > > > > alexey.scherbakoff@gmail.com>:
> >> > > > > > > >
> >> > > > > > > > > The poor man's solution for the problem would be
> stopping
> >> > > > > fragmented
> >> > > > > > > node
> >> > > > > > > > > and removing partition data, then starting it again
> >> allowing
> >> > > full
> >> > > > > > state
> >> > > > > > > > > transfer already without deletes.
> >> > > > > > > > > Rinse and repeat for all owners.
> >> > > > > > > > >
> >> > > > > > > > > Anton Vinogradov, would this work for you as workaround
> ?
> >> > > > > > > > >
> >> > > > > > > > > чт, 19 сент. 2019 г. в 13:03, Anton Vinogradov <
> >> av@apache.org
> >> > > >:
> >> > > > > > > > >
> >> > > > > > > > >> Alexey,
> >> > > > > > > > >>
> >> > > > > > > > >> Let's combine your and Ivan's proposals.
> >> > > > > > > > >>
> >> > > > > > > > >> >> vacuum command, which acquires exclusive table lock,
> >> so no
> >> > > > > > > concurrent
> >> > > > > > > > >> activities on the table are possible.
> >> > > > > > > > >> and
> >> > > > > > > > >> >> Could the problem be solved by stopping a node which
> >> needs
> >> > > to
> >> > > > > be
> >> > > > > > > > >> defragmented, clearing persistence files and restarting
> >> the
> >> > > > node?
> >> > > > > > > > >> >> After rebalancing the node will receive all data
> back
> >> > > without
> >> > > > > > > > >> fragmentation.
> >> > > > > > > > >>
> >> > > > > > > > >> How about to have special partition state SHRINKING?
> >> > > > > > > > >> This state should mean that partition unavailable for
> >> reads
> >> > > and
> >> > > > > > > updates
> >> > > > > > > > >> but
> >> > > > > > > > >> should keep it's update-counters and should not be
> >> marked as
> >> > > > lost,
> >> > > > > > > > renting
> >> > > > > > > > >> or evicted.
> >> > > > > > > > >> At this state we able to iterate over the partition and
> >> apply
> >> > > > it's
> >> > > > > > > > entries
> >> > > > > > > > >> to another file in a compact way.
> >> > > > > > > > >> Indices should be updated during the copy-on-shrink
> >> procedure
> >> > > or
> >> > > > > at
> >> > > > > > > the
> >> > > > > > > > >> shrink completion.
> >> > > > > > > > >> Once shrank file is ready we should replace the
> original
> >> > > > partition
> >> > > > > > > file
> >> > > > > > > > >> with it and mark it as MOVING which will start the
> >> historical
> >> > > > > > > rebalance.
> >> > > > > > > > >> Shrinking should be performed during the low activity
> >> periods,
> >> > > > but
> >> > > > > > > even
> >> > > > > > > > in
> >> > > > > > > > >> case we found that activity was high and historical
> >> rebalance
> >> > > is
> >> > > > > not
> >> > > > > > > > >> suitable we may just remove the file and use regular
> >> rebalance
> >> > > > to
> >> > > > > > > > restore
> >> > > > > > > > >> the partition (this will also lead to shrink).
> >> > > > > > > > >>
> >> > > > > > > > >> BTW, seems, we able to implement partition shrink in a
> >> cheap
> >> > > > way.
> >> > > > > > > > >> We may just use rebalancing code to apply fat
> partition's
> >> > > > entries
> >> > > > > to
> >> > > > > > > the
> >> > > > > > > > >> new file.
> >> > > > > > > > >> So, 3 stages here: local rebalance, indices update and
> >> global
> >> > > > > > > historical
> >> > > > > > > > >> rebalance.
> >> > > > > > > > >>
> >> > > > > > > > >> On Thu, Sep 19, 2019 at 11:43 AM Alexey Goncharuk <
> >> > > > > > > > >> alexey.goncharuk@gmail.com> wrote:
> >> > > > > > > > >>
> >> > > > > > > > >> > Anton,
> >> > > > > > > > >> >
> >> > > > > > > > >> >
> >> > > > > > > > >> > > >>  The solution which Anton suggested does not
> look
> >> easy
> >> > > > > > because
> >> > > > > > > it
> >> > > > > > > > >> will
> >> > > > > > > > >> > > most likely significantly hurt performance
> >> > > > > > > > >> > > Mostly agree here, but what drop do we expect? What
> >> price
> >> > > do
> >> > > > > we
> >> > > > > > > > ready
> >> > > > > > > > >> to
> >> > > > > > > > >> > > pay?
> >> > > > > > > > >> > > Not sure, but seems some vendors ready to pay, for
> >> > > example,
> >> > > > 5%
> >> > > > > > > drop
> >> > > > > > > > >> for
> >> > > > > > > > >> > > this.
> >> > > > > > > > >> >
> >> > > > > > > > >> > 5% may be a big drop for some use-cases, so I think
> we
> >> > > should
> >> > > > > look
> >> > > > > > > at
> >> > > > > > > > >> how
> >> > > > > > > > >> > to improve performance, not how to make it worse.
> >> > > > > > > > >> >
> >> > > > > > > > >> >
> >> > > > > > > > >> > >
> >> > > > > > > > >> > > >> it is hard to maintain a data structure to
> choose
> >> "page
> >> > > > > from
> >> > > > > > > > >> free-list
> >> > > > > > > > >> > > with enough space closest to the beginning of the
> >> file".
> >> > > > > > > > >> > > We can just split each free-list bucket to the
> >> couple and
> >> > > > use
> >> > > > > > > first
> >> > > > > > > > >> for
> >> > > > > > > > >> > > pages in the first half of the file and the second
> >> for the
> >> > > > > last.
> >> > > > > > > > >> > > Only two buckets required here since, during the
> file
> >> > > > shrink,
> >> > > > > > > first
> >> > > > > > > > >> > > bucket's window will be shrank too.
> >> > > > > > > > >> > > Seems, this give us the same price on put, just use
> >> the
> >> > > > first
> >> > > > > > > bucket
> >> > > > > > > > >> in
> >> > > > > > > > >> > > case it's not empty.
> >> > > > > > > > >> > > Remove price (with merge) will be increased, of
> >> course.
> >> > > > > > > > >> > >
> >> > > > > > > > >> > > The compromise solution is to have priority put (to
> >> the
> >> > > > first
> >> > > > > > path
> >> > > > > > > > of
> >> > > > > > > > >> the
> >> > > > > > > > >> > > file), with keeping removal as is, and schedulable
> >> > > per-page
> >> > > > > > > > migration
> >> > > > > > > > >> for
> >> > > > > > > > >> > > the rest of the data during the low activity
> period.
> >> > > > > > > > >> > >
> >> > > > > > > > >> > Free lists are large and slow by themselves, it is
> >> expensive
> >> > > > to
> >> > > > > > > > >> checkpoint
> >> > > > > > > > >> > and read them on start, so as a long-term solution I
> >> would
> >> > > > look
> >> > > > > > into
> >> > > > > > > > >> > removing them. Moreover, not sure if adding yet
> another
> >> > > > > background
> >> > > > > > > > >> process
> >> > > > > > > > >> > will improve the codebase reliability and simplicity.
> >> > > > > > > > >> >
> >> > > > > > > > >> > If we want to go the hard path, I would look at free
> >> page
> >> > > > > tracking
> >> > > > > > > > >> bitmap -
> >> > > > > > > > >> > a special bitmask page, where each page in an
> adjacent
> >> block
> >> > > > is
> >> > > > > > > marked
> >> > > > > > > > >> as 0
> >> > > > > > > > >> > if it has free space more than a certain configurable
> >> > > > threshold
> >> > > > > > > (say,
> >> > > > > > > > >> 80%)
> >> > > > > > > > >> > - free, and 1 if less (full). Some vendors have
> >> successfully
> >> > > > > > > > implemented
> >> > > > > > > > >> > this approach, which looks much more promising, but
> >> harder
> >> > > to
> >> > > > > > > > implement.
> >> > > > > > > > >> >
> >> > > > > > > > >> > --AG
> >> > > > > > > > >> >
> >> > > > > > > > >>
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > > --
> >> > > > > > > > >
> >> > > > > > > > > Best regards,
> >> > > > > > > > > Alexei Scherbakov
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > > --
> >> > > > > > > >
> >> > > > > > > > Best regards,
> >> > > > > > > > Alexei Scherbakov
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >> >
> >> > --
> >> > Sergey Kozlov
> >> > GridGain Systems
> >> > www.gridgain.com
> >>
> >
>

Re: How to free up space on disc after removing entries from IgniteCache with enabled PDS?

Posted by Alexey Goncharuk <al...@gmail.com>.
Created a ticket for the first stage of this improvement. This can be a
first change towards the online mode suggested by Sergey and Anton.
https://issues.apache.org/jira/browse/IGNITE-12263

пт, 4 окт. 2019 г. в 19:38, Alexey Goncharuk <al...@gmail.com>:

> Maxim,
>
> Having a cluster-wide lock for a cache does not improve availability of
> the solution. A user cannot defragment a cache if the cache is involved in
> a mission-critical operation, so having a lock on such a cache is
> equivalent to the whole cluster shutdown.
>
> We should decide between either a single offline node or a more complex
> fully online solution.
>
> пт, 4 окт. 2019 г. в 11:55, Maxim Muzafarov <mm...@apache.org>:
>
>> Igniters,
>>
>> This thread seems to be endless, but we if some kind of cache group
>> distributed write lock (exclusive for some of the internal Ignite
>> process) will be introduced? I think it will help to solve a batch of
>> problems, like:
>>
>> 1. defragmentation of all cache group partitions on the local node
>> without concurrent updates.
>> 2. improve data loading with data streamer isolation mode [1]. It
>> seems we should not allow concurrent updates to cache if we on `fast
>> data load` step.
>> 3. recovery from a snapshot without cache stop\start actions
>>
>>
>> [1] https://issues.apache.org/jira/browse/IGNITE-11793
>>
>> On Thu, 3 Oct 2019 at 22:50, Sergey Kozlov <sk...@gridgain.com> wrote:
>> >
>> > Hi
>> >
>> > I'm not sure that node offline is a best way to do that.
>> > Cons:
>> >  - different caches may have different defragmentation but we force to
>> stop
>> > whole node
>> >  - offline node is a maintenance operation will require to add +1
>> backup to
>> > reduce the risk of data loss
>> >  - baseline auto adjustment?
>> >  - impact to index rebuild?
>> >  - cache configuration changes (or destroy) during node offline
>> >
>> > What about other ways without node stop? E.g. make cache group on a node
>> > offline? Add *defrag <cache_group> *command to control.sh to force start
>> > rebalance internally in the node with expected impact to performance.
>> >
>> >
>> >
>> > On Thu, Oct 3, 2019 at 12:08 PM Anton Vinogradov <av...@apache.org> wrote:
>> >
>> > > Alexey,
>> > > As for me, it does not matter will it be IEP, umbrella or a single
>> issue.
>> > > The most important thing is Assignee :)
>> > >
>> > > On Thu, Oct 3, 2019 at 11:59 AM Alexey Goncharuk <
>> > > alexey.goncharuk@gmail.com>
>> > > wrote:
>> > >
>> > > > Anton, do you think we should file a single ticket for this or
>> should we
>> > > go
>> > > > with an IEP? As of now, the change does not look big enough for an
>> IEP
>> > > for
>> > > > me.
>> > > >
>> > > > чт, 3 окт. 2019 г. в 11:18, Anton Vinogradov <av...@apache.org>:
>> > > >
>> > > > > Alexey,
>> > > > >
>> > > > > Sounds good to me.
>> > > > >
>> > > > > On Thu, Oct 3, 2019 at 10:51 AM Alexey Goncharuk <
>> > > > > alexey.goncharuk@gmail.com>
>> > > > > wrote:
>> > > > >
>> > > > > > Anton,
>> > > > > >
>> > > > > > Switching a partition to and from the SHRINKING state will
>> require
>> > > > > > intricate synchronizations in order to properly determine the
>> start
>> > > > > > position for historical rebalance without PME.
>> > > > > >
>> > > > > > I would still go with an offline-node approach, but instead of
>> > > cleaning
>> > > > > the
>> > > > > > persistence, we can do effective defragmentation when the node
>> is
>> > > > offline
>> > > > > > because we are sure that there is no concurrent load. After the
>> > > > > > defragmentation completes, we bring the node back to the
>> cluster and
>> > > > > > historical rebalance will kick in automatically. It will still
>> > > require
>> > > > > > manual node restarts, but since the data is not removed, there
>> are no
>> > > > > > additional risks. Also, this will be an excellent solution for
>> those
>> > > > who
>> > > > > > can afford downtime and execute the defragment command on all
>> nodes
>> > > in
>> > > > > the
>> > > > > > cluster simultaneously - this will be the fastest way possible.
>> > > > > >
>> > > > > > --AG
>> > > > > >
>> > > > > > пн, 30 сент. 2019 г. в 09:29, Anton Vinogradov <av...@apache.org>:
>> > > > > >
>> > > > > > > Alexei,
>> > > > > > > >> stopping fragmented node and removing partition data, then
>> > > > starting
>> > > > > it
>> > > > > > > again
>> > > > > > >
>> > > > > > > That's exactly what we're doing to solve the fragmentation
>> issue.
>> > > > > > > The problem here is that we have to perform N/B
>> restart-rebalance
>> > > > > > > operations (N - cluster size, B - backups count) and it takes
>> a lot
>> > > > of
>> > > > > > time
>> > > > > > > with risks to lose the data.
>> > > > > > >
>> > > > > > > On Fri, Sep 27, 2019 at 5:49 PM Alexei Scherbakov <
>> > > > > > > alexey.scherbakoff@gmail.com> wrote:
>> > > > > > >
>> > > > > > > > Probably this should be allowed to do using public API,
>> actually
>> > > > this
>> > > > > > is
>> > > > > > > > same as manual rebalancing.
>> > > > > > > >
>> > > > > > > > пт, 27 сент. 2019 г. в 17:40, Alexei Scherbakov <
>> > > > > > > > alexey.scherbakoff@gmail.com>:
>> > > > > > > >
>> > > > > > > > > The poor man's solution for the problem would be stopping
>> > > > > fragmented
>> > > > > > > node
>> > > > > > > > > and removing partition data, then starting it again
>> allowing
>> > > full
>> > > > > > state
>> > > > > > > > > transfer already without deletes.
>> > > > > > > > > Rinse and repeat for all owners.
>> > > > > > > > >
>> > > > > > > > > Anton Vinogradov, would this work for you as workaround ?
>> > > > > > > > >
>> > > > > > > > > чт, 19 сент. 2019 г. в 13:03, Anton Vinogradov <
>> av@apache.org
>> > > >:
>> > > > > > > > >
>> > > > > > > > >> Alexey,
>> > > > > > > > >>
>> > > > > > > > >> Let's combine your and Ivan's proposals.
>> > > > > > > > >>
>> > > > > > > > >> >> vacuum command, which acquires exclusive table lock,
>> so no
>> > > > > > > concurrent
>> > > > > > > > >> activities on the table are possible.
>> > > > > > > > >> and
>> > > > > > > > >> >> Could the problem be solved by stopping a node which
>> needs
>> > > to
>> > > > > be
>> > > > > > > > >> defragmented, clearing persistence files and restarting
>> the
>> > > > node?
>> > > > > > > > >> >> After rebalancing the node will receive all data back
>> > > without
>> > > > > > > > >> fragmentation.
>> > > > > > > > >>
>> > > > > > > > >> How about to have special partition state SHRINKING?
>> > > > > > > > >> This state should mean that partition unavailable for
>> reads
>> > > and
>> > > > > > > updates
>> > > > > > > > >> but
>> > > > > > > > >> should keep it's update-counters and should not be
>> marked as
>> > > > lost,
>> > > > > > > > renting
>> > > > > > > > >> or evicted.
>> > > > > > > > >> At this state we able to iterate over the partition and
>> apply
>> > > > it's
>> > > > > > > > entries
>> > > > > > > > >> to another file in a compact way.
>> > > > > > > > >> Indices should be updated during the copy-on-shrink
>> procedure
>> > > or
>> > > > > at
>> > > > > > > the
>> > > > > > > > >> shrink completion.
>> > > > > > > > >> Once shrank file is ready we should replace the original
>> > > > partition
>> > > > > > > file
>> > > > > > > > >> with it and mark it as MOVING which will start the
>> historical
>> > > > > > > rebalance.
>> > > > > > > > >> Shrinking should be performed during the low activity
>> periods,
>> > > > but
>> > > > > > > even
>> > > > > > > > in
>> > > > > > > > >> case we found that activity was high and historical
>> rebalance
>> > > is
>> > > > > not
>> > > > > > > > >> suitable we may just remove the file and use regular
>> rebalance
>> > > > to
>> > > > > > > > restore
>> > > > > > > > >> the partition (this will also lead to shrink).
>> > > > > > > > >>
>> > > > > > > > >> BTW, seems, we able to implement partition shrink in a
>> cheap
>> > > > way.
>> > > > > > > > >> We may just use rebalancing code to apply fat partition's
>> > > > entries
>> > > > > to
>> > > > > > > the
>> > > > > > > > >> new file.
>> > > > > > > > >> So, 3 stages here: local rebalance, indices update and
>> global
>> > > > > > > historical
>> > > > > > > > >> rebalance.
>> > > > > > > > >>
>> > > > > > > > >> On Thu, Sep 19, 2019 at 11:43 AM Alexey Goncharuk <
>> > > > > > > > >> alexey.goncharuk@gmail.com> wrote:
>> > > > > > > > >>
>> > > > > > > > >> > Anton,
>> > > > > > > > >> >
>> > > > > > > > >> >
>> > > > > > > > >> > > >>  The solution which Anton suggested does not look
>> easy
>> > > > > > because
>> > > > > > > it
>> > > > > > > > >> will
>> > > > > > > > >> > > most likely significantly hurt performance
>> > > > > > > > >> > > Mostly agree here, but what drop do we expect? What
>> price
>> > > do
>> > > > > we
>> > > > > > > > ready
>> > > > > > > > >> to
>> > > > > > > > >> > > pay?
>> > > > > > > > >> > > Not sure, but seems some vendors ready to pay, for
>> > > example,
>> > > > 5%
>> > > > > > > drop
>> > > > > > > > >> for
>> > > > > > > > >> > > this.
>> > > > > > > > >> >
>> > > > > > > > >> > 5% may be a big drop for some use-cases, so I think we
>> > > should
>> > > > > look
>> > > > > > > at
>> > > > > > > > >> how
>> > > > > > > > >> > to improve performance, not how to make it worse.
>> > > > > > > > >> >
>> > > > > > > > >> >
>> > > > > > > > >> > >
>> > > > > > > > >> > > >> it is hard to maintain a data structure to choose
>> "page
>> > > > > from
>> > > > > > > > >> free-list
>> > > > > > > > >> > > with enough space closest to the beginning of the
>> file".
>> > > > > > > > >> > > We can just split each free-list bucket to the
>> couple and
>> > > > use
>> > > > > > > first
>> > > > > > > > >> for
>> > > > > > > > >> > > pages in the first half of the file and the second
>> for the
>> > > > > last.
>> > > > > > > > >> > > Only two buckets required here since, during the file
>> > > > shrink,
>> > > > > > > first
>> > > > > > > > >> > > bucket's window will be shrank too.
>> > > > > > > > >> > > Seems, this give us the same price on put, just use
>> the
>> > > > first
>> > > > > > > bucket
>> > > > > > > > >> in
>> > > > > > > > >> > > case it's not empty.
>> > > > > > > > >> > > Remove price (with merge) will be increased, of
>> course.
>> > > > > > > > >> > >
>> > > > > > > > >> > > The compromise solution is to have priority put (to
>> the
>> > > > first
>> > > > > > path
>> > > > > > > > of
>> > > > > > > > >> the
>> > > > > > > > >> > > file), with keeping removal as is, and schedulable
>> > > per-page
>> > > > > > > > migration
>> > > > > > > > >> for
>> > > > > > > > >> > > the rest of the data during the low activity period.
>> > > > > > > > >> > >
>> > > > > > > > >> > Free lists are large and slow by themselves, it is
>> expensive
>> > > > to
>> > > > > > > > >> checkpoint
>> > > > > > > > >> > and read them on start, so as a long-term solution I
>> would
>> > > > look
>> > > > > > into
>> > > > > > > > >> > removing them. Moreover, not sure if adding yet another
>> > > > > background
>> > > > > > > > >> process
>> > > > > > > > >> > will improve the codebase reliability and simplicity.
>> > > > > > > > >> >
>> > > > > > > > >> > If we want to go the hard path, I would look at free
>> page
>> > > > > tracking
>> > > > > > > > >> bitmap -
>> > > > > > > > >> > a special bitmask page, where each page in an adjacent
>> block
>> > > > is
>> > > > > > > marked
>> > > > > > > > >> as 0
>> > > > > > > > >> > if it has free space more than a certain configurable
>> > > > threshold
>> > > > > > > (say,
>> > > > > > > > >> 80%)
>> > > > > > > > >> > - free, and 1 if less (full). Some vendors have
>> successfully
>> > > > > > > > implemented
>> > > > > > > > >> > this approach, which looks much more promising, but
>> harder
>> > > to
>> > > > > > > > implement.
>> > > > > > > > >> >
>> > > > > > > > >> > --AG
>> > > > > > > > >> >
>> > > > > > > > >>
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > --
>> > > > > > > > >
>> > > > > > > > > Best regards,
>> > > > > > > > > Alexei Scherbakov
>> > > > > > > > >
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > --
>> > > > > > > >
>> > > > > > > > Best regards,
>> > > > > > > > Alexei Scherbakov
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> >
>> > --
>> > Sergey Kozlov
>> > GridGain Systems
>> > www.gridgain.com
>>
>

Re: How to free up space on disc after removing entries from IgniteCache with enabled PDS?

Posted by Alexey Goncharuk <al...@gmail.com>.
Maxim,

Having a cluster-wide lock for a cache does not improve availability of the
solution. A user cannot defragment a cache if the cache is involved in a
mission-critical operation, so having a lock on such a cache is equivalent
to the whole cluster shutdown.

We should decide between either a single offline node or a more complex
fully online solution.

пт, 4 окт. 2019 г. в 11:55, Maxim Muzafarov <mm...@apache.org>:

> Igniters,
>
> This thread seems to be endless, but we if some kind of cache group
> distributed write lock (exclusive for some of the internal Ignite
> process) will be introduced? I think it will help to solve a batch of
> problems, like:
>
> 1. defragmentation of all cache group partitions on the local node
> without concurrent updates.
> 2. improve data loading with data streamer isolation mode [1]. It
> seems we should not allow concurrent updates to cache if we on `fast
> data load` step.
> 3. recovery from a snapshot without cache stop\start actions
>
>
> [1] https://issues.apache.org/jira/browse/IGNITE-11793
>
> On Thu, 3 Oct 2019 at 22:50, Sergey Kozlov <sk...@gridgain.com> wrote:
> >
> > Hi
> >
> > I'm not sure that node offline is a best way to do that.
> > Cons:
> >  - different caches may have different defragmentation but we force to
> stop
> > whole node
> >  - offline node is a maintenance operation will require to add +1 backup
> to
> > reduce the risk of data loss
> >  - baseline auto adjustment?
> >  - impact to index rebuild?
> >  - cache configuration changes (or destroy) during node offline
> >
> > What about other ways without node stop? E.g. make cache group on a node
> > offline? Add *defrag <cache_group> *command to control.sh to force start
> > rebalance internally in the node with expected impact to performance.
> >
> >
> >
> > On Thu, Oct 3, 2019 at 12:08 PM Anton Vinogradov <av...@apache.org> wrote:
> >
> > > Alexey,
> > > As for me, it does not matter will it be IEP, umbrella or a single
> issue.
> > > The most important thing is Assignee :)
> > >
> > > On Thu, Oct 3, 2019 at 11:59 AM Alexey Goncharuk <
> > > alexey.goncharuk@gmail.com>
> > > wrote:
> > >
> > > > Anton, do you think we should file a single ticket for this or
> should we
> > > go
> > > > with an IEP? As of now, the change does not look big enough for an
> IEP
> > > for
> > > > me.
> > > >
> > > > чт, 3 окт. 2019 г. в 11:18, Anton Vinogradov <av...@apache.org>:
> > > >
> > > > > Alexey,
> > > > >
> > > > > Sounds good to me.
> > > > >
> > > > > On Thu, Oct 3, 2019 at 10:51 AM Alexey Goncharuk <
> > > > > alexey.goncharuk@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Anton,
> > > > > >
> > > > > > Switching a partition to and from the SHRINKING state will
> require
> > > > > > intricate synchronizations in order to properly determine the
> start
> > > > > > position for historical rebalance without PME.
> > > > > >
> > > > > > I would still go with an offline-node approach, but instead of
> > > cleaning
> > > > > the
> > > > > > persistence, we can do effective defragmentation when the node is
> > > > offline
> > > > > > because we are sure that there is no concurrent load. After the
> > > > > > defragmentation completes, we bring the node back to the cluster
> and
> > > > > > historical rebalance will kick in automatically. It will still
> > > require
> > > > > > manual node restarts, but since the data is not removed, there
> are no
> > > > > > additional risks. Also, this will be an excellent solution for
> those
> > > > who
> > > > > > can afford downtime and execute the defragment command on all
> nodes
> > > in
> > > > > the
> > > > > > cluster simultaneously - this will be the fastest way possible.
> > > > > >
> > > > > > --AG
> > > > > >
> > > > > > пн, 30 сент. 2019 г. в 09:29, Anton Vinogradov <av...@apache.org>:
> > > > > >
> > > > > > > Alexei,
> > > > > > > >> stopping fragmented node and removing partition data, then
> > > > starting
> > > > > it
> > > > > > > again
> > > > > > >
> > > > > > > That's exactly what we're doing to solve the fragmentation
> issue.
> > > > > > > The problem here is that we have to perform N/B
> restart-rebalance
> > > > > > > operations (N - cluster size, B - backups count) and it takes
> a lot
> > > > of
> > > > > > time
> > > > > > > with risks to lose the data.
> > > > > > >
> > > > > > > On Fri, Sep 27, 2019 at 5:49 PM Alexei Scherbakov <
> > > > > > > alexey.scherbakoff@gmail.com> wrote:
> > > > > > >
> > > > > > > > Probably this should be allowed to do using public API,
> actually
> > > > this
> > > > > > is
> > > > > > > > same as manual rebalancing.
> > > > > > > >
> > > > > > > > пт, 27 сент. 2019 г. в 17:40, Alexei Scherbakov <
> > > > > > > > alexey.scherbakoff@gmail.com>:
> > > > > > > >
> > > > > > > > > The poor man's solution for the problem would be stopping
> > > > > fragmented
> > > > > > > node
> > > > > > > > > and removing partition data, then starting it again
> allowing
> > > full
> > > > > > state
> > > > > > > > > transfer already without deletes.
> > > > > > > > > Rinse and repeat for all owners.
> > > > > > > > >
> > > > > > > > > Anton Vinogradov, would this work for you as workaround ?
> > > > > > > > >
> > > > > > > > > чт, 19 сент. 2019 г. в 13:03, Anton Vinogradov <
> av@apache.org
> > > >:
> > > > > > > > >
> > > > > > > > >> Alexey,
> > > > > > > > >>
> > > > > > > > >> Let's combine your and Ivan's proposals.
> > > > > > > > >>
> > > > > > > > >> >> vacuum command, which acquires exclusive table lock,
> so no
> > > > > > > concurrent
> > > > > > > > >> activities on the table are possible.
> > > > > > > > >> and
> > > > > > > > >> >> Could the problem be solved by stopping a node which
> needs
> > > to
> > > > > be
> > > > > > > > >> defragmented, clearing persistence files and restarting
> the
> > > > node?
> > > > > > > > >> >> After rebalancing the node will receive all data back
> > > without
> > > > > > > > >> fragmentation.
> > > > > > > > >>
> > > > > > > > >> How about to have special partition state SHRINKING?
> > > > > > > > >> This state should mean that partition unavailable for
> reads
> > > and
> > > > > > > updates
> > > > > > > > >> but
> > > > > > > > >> should keep it's update-counters and should not be marked
> as
> > > > lost,
> > > > > > > > renting
> > > > > > > > >> or evicted.
> > > > > > > > >> At this state we able to iterate over the partition and
> apply
> > > > it's
> > > > > > > > entries
> > > > > > > > >> to another file in a compact way.
> > > > > > > > >> Indices should be updated during the copy-on-shrink
> procedure
> > > or
> > > > > at
> > > > > > > the
> > > > > > > > >> shrink completion.
> > > > > > > > >> Once shrank file is ready we should replace the original
> > > > partition
> > > > > > > file
> > > > > > > > >> with it and mark it as MOVING which will start the
> historical
> > > > > > > rebalance.
> > > > > > > > >> Shrinking should be performed during the low activity
> periods,
> > > > but
> > > > > > > even
> > > > > > > > in
> > > > > > > > >> case we found that activity was high and historical
> rebalance
> > > is
> > > > > not
> > > > > > > > >> suitable we may just remove the file and use regular
> rebalance
> > > > to
> > > > > > > > restore
> > > > > > > > >> the partition (this will also lead to shrink).
> > > > > > > > >>
> > > > > > > > >> BTW, seems, we able to implement partition shrink in a
> cheap
> > > > way.
> > > > > > > > >> We may just use rebalancing code to apply fat partition's
> > > > entries
> > > > > to
> > > > > > > the
> > > > > > > > >> new file.
> > > > > > > > >> So, 3 stages here: local rebalance, indices update and
> global
> > > > > > > historical
> > > > > > > > >> rebalance.
> > > > > > > > >>
> > > > > > > > >> On Thu, Sep 19, 2019 at 11:43 AM Alexey Goncharuk <
> > > > > > > > >> alexey.goncharuk@gmail.com> wrote:
> > > > > > > > >>
> > > > > > > > >> > Anton,
> > > > > > > > >> >
> > > > > > > > >> >
> > > > > > > > >> > > >>  The solution which Anton suggested does not look
> easy
> > > > > > because
> > > > > > > it
> > > > > > > > >> will
> > > > > > > > >> > > most likely significantly hurt performance
> > > > > > > > >> > > Mostly agree here, but what drop do we expect? What
> price
> > > do
> > > > > we
> > > > > > > > ready
> > > > > > > > >> to
> > > > > > > > >> > > pay?
> > > > > > > > >> > > Not sure, but seems some vendors ready to pay, for
> > > example,
> > > > 5%
> > > > > > > drop
> > > > > > > > >> for
> > > > > > > > >> > > this.
> > > > > > > > >> >
> > > > > > > > >> > 5% may be a big drop for some use-cases, so I think we
> > > should
> > > > > look
> > > > > > > at
> > > > > > > > >> how
> > > > > > > > >> > to improve performance, not how to make it worse.
> > > > > > > > >> >
> > > > > > > > >> >
> > > > > > > > >> > >
> > > > > > > > >> > > >> it is hard to maintain a data structure to choose
> "page
> > > > > from
> > > > > > > > >> free-list
> > > > > > > > >> > > with enough space closest to the beginning of the
> file".
> > > > > > > > >> > > We can just split each free-list bucket to the couple
> and
> > > > use
> > > > > > > first
> > > > > > > > >> for
> > > > > > > > >> > > pages in the first half of the file and the second
> for the
> > > > > last.
> > > > > > > > >> > > Only two buckets required here since, during the file
> > > > shrink,
> > > > > > > first
> > > > > > > > >> > > bucket's window will be shrank too.
> > > > > > > > >> > > Seems, this give us the same price on put, just use
> the
> > > > first
> > > > > > > bucket
> > > > > > > > >> in
> > > > > > > > >> > > case it's not empty.
> > > > > > > > >> > > Remove price (with merge) will be increased, of
> course.
> > > > > > > > >> > >
> > > > > > > > >> > > The compromise solution is to have priority put (to
> the
> > > > first
> > > > > > path
> > > > > > > > of
> > > > > > > > >> the
> > > > > > > > >> > > file), with keeping removal as is, and schedulable
> > > per-page
> > > > > > > > migration
> > > > > > > > >> for
> > > > > > > > >> > > the rest of the data during the low activity period.
> > > > > > > > >> > >
> > > > > > > > >> > Free lists are large and slow by themselves, it is
> expensive
> > > > to
> > > > > > > > >> checkpoint
> > > > > > > > >> > and read them on start, so as a long-term solution I
> would
> > > > look
> > > > > > into
> > > > > > > > >> > removing them. Moreover, not sure if adding yet another
> > > > > background
> > > > > > > > >> process
> > > > > > > > >> > will improve the codebase reliability and simplicity.
> > > > > > > > >> >
> > > > > > > > >> > If we want to go the hard path, I would look at free
> page
> > > > > tracking
> > > > > > > > >> bitmap -
> > > > > > > > >> > a special bitmask page, where each page in an adjacent
> block
> > > > is
> > > > > > > marked
> > > > > > > > >> as 0
> > > > > > > > >> > if it has free space more than a certain configurable
> > > > threshold
> > > > > > > (say,
> > > > > > > > >> 80%)
> > > > > > > > >> > - free, and 1 if less (full). Some vendors have
> successfully
> > > > > > > > implemented
> > > > > > > > >> > this approach, which looks much more promising, but
> harder
> > > to
> > > > > > > > implement.
> > > > > > > > >> >
> > > > > > > > >> > --AG
> > > > > > > > >> >
> > > > > > > > >>
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > >
> > > > > > > > > Best regards,
> > > > > > > > > Alexei Scherbakov
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > >
> > > > > > > > Best regards,
> > > > > > > > Alexei Scherbakov
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> > --
> > Sergey Kozlov
> > GridGain Systems
> > www.gridgain.com
>

Re: How to free up space on disc after removing entries from IgniteCache with enabled PDS?

Posted by Maxim Muzafarov <mm...@apache.org>.
Igniters,

This thread seems to be endless, but we if some kind of cache group
distributed write lock (exclusive for some of the internal Ignite
process) will be introduced? I think it will help to solve a batch of
problems, like:

1. defragmentation of all cache group partitions on the local node
without concurrent updates.
2. improve data loading with data streamer isolation mode [1]. It
seems we should not allow concurrent updates to cache if we on `fast
data load` step.
3. recovery from a snapshot without cache stop\start actions


[1] https://issues.apache.org/jira/browse/IGNITE-11793

On Thu, 3 Oct 2019 at 22:50, Sergey Kozlov <sk...@gridgain.com> wrote:
>
> Hi
>
> I'm not sure that node offline is a best way to do that.
> Cons:
>  - different caches may have different defragmentation but we force to stop
> whole node
>  - offline node is a maintenance operation will require to add +1 backup to
> reduce the risk of data loss
>  - baseline auto adjustment?
>  - impact to index rebuild?
>  - cache configuration changes (or destroy) during node offline
>
> What about other ways without node stop? E.g. make cache group on a node
> offline? Add *defrag <cache_group> *command to control.sh to force start
> rebalance internally in the node with expected impact to performance.
>
>
>
> On Thu, Oct 3, 2019 at 12:08 PM Anton Vinogradov <av...@apache.org> wrote:
>
> > Alexey,
> > As for me, it does not matter will it be IEP, umbrella or a single issue.
> > The most important thing is Assignee :)
> >
> > On Thu, Oct 3, 2019 at 11:59 AM Alexey Goncharuk <
> > alexey.goncharuk@gmail.com>
> > wrote:
> >
> > > Anton, do you think we should file a single ticket for this or should we
> > go
> > > with an IEP? As of now, the change does not look big enough for an IEP
> > for
> > > me.
> > >
> > > чт, 3 окт. 2019 г. в 11:18, Anton Vinogradov <av...@apache.org>:
> > >
> > > > Alexey,
> > > >
> > > > Sounds good to me.
> > > >
> > > > On Thu, Oct 3, 2019 at 10:51 AM Alexey Goncharuk <
> > > > alexey.goncharuk@gmail.com>
> > > > wrote:
> > > >
> > > > > Anton,
> > > > >
> > > > > Switching a partition to and from the SHRINKING state will require
> > > > > intricate synchronizations in order to properly determine the start
> > > > > position for historical rebalance without PME.
> > > > >
> > > > > I would still go with an offline-node approach, but instead of
> > cleaning
> > > > the
> > > > > persistence, we can do effective defragmentation when the node is
> > > offline
> > > > > because we are sure that there is no concurrent load. After the
> > > > > defragmentation completes, we bring the node back to the cluster and
> > > > > historical rebalance will kick in automatically. It will still
> > require
> > > > > manual node restarts, but since the data is not removed, there are no
> > > > > additional risks. Also, this will be an excellent solution for those
> > > who
> > > > > can afford downtime and execute the defragment command on all nodes
> > in
> > > > the
> > > > > cluster simultaneously - this will be the fastest way possible.
> > > > >
> > > > > --AG
> > > > >
> > > > > пн, 30 сент. 2019 г. в 09:29, Anton Vinogradov <av...@apache.org>:
> > > > >
> > > > > > Alexei,
> > > > > > >> stopping fragmented node and removing partition data, then
> > > starting
> > > > it
> > > > > > again
> > > > > >
> > > > > > That's exactly what we're doing to solve the fragmentation issue.
> > > > > > The problem here is that we have to perform N/B restart-rebalance
> > > > > > operations (N - cluster size, B - backups count) and it takes a lot
> > > of
> > > > > time
> > > > > > with risks to lose the data.
> > > > > >
> > > > > > On Fri, Sep 27, 2019 at 5:49 PM Alexei Scherbakov <
> > > > > > alexey.scherbakoff@gmail.com> wrote:
> > > > > >
> > > > > > > Probably this should be allowed to do using public API, actually
> > > this
> > > > > is
> > > > > > > same as manual rebalancing.
> > > > > > >
> > > > > > > пт, 27 сент. 2019 г. в 17:40, Alexei Scherbakov <
> > > > > > > alexey.scherbakoff@gmail.com>:
> > > > > > >
> > > > > > > > The poor man's solution for the problem would be stopping
> > > > fragmented
> > > > > > node
> > > > > > > > and removing partition data, then starting it again allowing
> > full
> > > > > state
> > > > > > > > transfer already without deletes.
> > > > > > > > Rinse and repeat for all owners.
> > > > > > > >
> > > > > > > > Anton Vinogradov, would this work for you as workaround ?
> > > > > > > >
> > > > > > > > чт, 19 сент. 2019 г. в 13:03, Anton Vinogradov <av@apache.org
> > >:
> > > > > > > >
> > > > > > > >> Alexey,
> > > > > > > >>
> > > > > > > >> Let's combine your and Ivan's proposals.
> > > > > > > >>
> > > > > > > >> >> vacuum command, which acquires exclusive table lock, so no
> > > > > > concurrent
> > > > > > > >> activities on the table are possible.
> > > > > > > >> and
> > > > > > > >> >> Could the problem be solved by stopping a node which needs
> > to
> > > > be
> > > > > > > >> defragmented, clearing persistence files and restarting the
> > > node?
> > > > > > > >> >> After rebalancing the node will receive all data back
> > without
> > > > > > > >> fragmentation.
> > > > > > > >>
> > > > > > > >> How about to have special partition state SHRINKING?
> > > > > > > >> This state should mean that partition unavailable for reads
> > and
> > > > > > updates
> > > > > > > >> but
> > > > > > > >> should keep it's update-counters and should not be marked as
> > > lost,
> > > > > > > renting
> > > > > > > >> or evicted.
> > > > > > > >> At this state we able to iterate over the partition and apply
> > > it's
> > > > > > > entries
> > > > > > > >> to another file in a compact way.
> > > > > > > >> Indices should be updated during the copy-on-shrink procedure
> > or
> > > > at
> > > > > > the
> > > > > > > >> shrink completion.
> > > > > > > >> Once shrank file is ready we should replace the original
> > > partition
> > > > > > file
> > > > > > > >> with it and mark it as MOVING which will start the historical
> > > > > > rebalance.
> > > > > > > >> Shrinking should be performed during the low activity periods,
> > > but
> > > > > > even
> > > > > > > in
> > > > > > > >> case we found that activity was high and historical rebalance
> > is
> > > > not
> > > > > > > >> suitable we may just remove the file and use regular rebalance
> > > to
> > > > > > > restore
> > > > > > > >> the partition (this will also lead to shrink).
> > > > > > > >>
> > > > > > > >> BTW, seems, we able to implement partition shrink in a cheap
> > > way.
> > > > > > > >> We may just use rebalancing code to apply fat partition's
> > > entries
> > > > to
> > > > > > the
> > > > > > > >> new file.
> > > > > > > >> So, 3 stages here: local rebalance, indices update and global
> > > > > > historical
> > > > > > > >> rebalance.
> > > > > > > >>
> > > > > > > >> On Thu, Sep 19, 2019 at 11:43 AM Alexey Goncharuk <
> > > > > > > >> alexey.goncharuk@gmail.com> wrote:
> > > > > > > >>
> > > > > > > >> > Anton,
> > > > > > > >> >
> > > > > > > >> >
> > > > > > > >> > > >>  The solution which Anton suggested does not look easy
> > > > > because
> > > > > > it
> > > > > > > >> will
> > > > > > > >> > > most likely significantly hurt performance
> > > > > > > >> > > Mostly agree here, but what drop do we expect? What price
> > do
> > > > we
> > > > > > > ready
> > > > > > > >> to
> > > > > > > >> > > pay?
> > > > > > > >> > > Not sure, but seems some vendors ready to pay, for
> > example,
> > > 5%
> > > > > > drop
> > > > > > > >> for
> > > > > > > >> > > this.
> > > > > > > >> >
> > > > > > > >> > 5% may be a big drop for some use-cases, so I think we
> > should
> > > > look
> > > > > > at
> > > > > > > >> how
> > > > > > > >> > to improve performance, not how to make it worse.
> > > > > > > >> >
> > > > > > > >> >
> > > > > > > >> > >
> > > > > > > >> > > >> it is hard to maintain a data structure to choose "page
> > > > from
> > > > > > > >> free-list
> > > > > > > >> > > with enough space closest to the beginning of the file".
> > > > > > > >> > > We can just split each free-list bucket to the couple and
> > > use
> > > > > > first
> > > > > > > >> for
> > > > > > > >> > > pages in the first half of the file and the second for the
> > > > last.
> > > > > > > >> > > Only two buckets required here since, during the file
> > > shrink,
> > > > > > first
> > > > > > > >> > > bucket's window will be shrank too.
> > > > > > > >> > > Seems, this give us the same price on put, just use the
> > > first
> > > > > > bucket
> > > > > > > >> in
> > > > > > > >> > > case it's not empty.
> > > > > > > >> > > Remove price (with merge) will be increased, of course.
> > > > > > > >> > >
> > > > > > > >> > > The compromise solution is to have priority put (to the
> > > first
> > > > > path
> > > > > > > of
> > > > > > > >> the
> > > > > > > >> > > file), with keeping removal as is, and schedulable
> > per-page
> > > > > > > migration
> > > > > > > >> for
> > > > > > > >> > > the rest of the data during the low activity period.
> > > > > > > >> > >
> > > > > > > >> > Free lists are large and slow by themselves, it is expensive
> > > to
> > > > > > > >> checkpoint
> > > > > > > >> > and read them on start, so as a long-term solution I would
> > > look
> > > > > into
> > > > > > > >> > removing them. Moreover, not sure if adding yet another
> > > > background
> > > > > > > >> process
> > > > > > > >> > will improve the codebase reliability and simplicity.
> > > > > > > >> >
> > > > > > > >> > If we want to go the hard path, I would look at free page
> > > > tracking
> > > > > > > >> bitmap -
> > > > > > > >> > a special bitmask page, where each page in an adjacent block
> > > is
> > > > > > marked
> > > > > > > >> as 0
> > > > > > > >> > if it has free space more than a certain configurable
> > > threshold
> > > > > > (say,
> > > > > > > >> 80%)
> > > > > > > >> > - free, and 1 if less (full). Some vendors have successfully
> > > > > > > implemented
> > > > > > > >> > this approach, which looks much more promising, but harder
> > to
> > > > > > > implement.
> > > > > > > >> >
> > > > > > > >> > --AG
> > > > > > > >> >
> > > > > > > >>
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > >
> > > > > > > > Best regards,
> > > > > > > > Alexei Scherbakov
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > >
> > > > > > > Best regards,
> > > > > > > Alexei Scherbakov
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
>
> --
> Sergey Kozlov
> GridGain Systems
> www.gridgain.com

Re: How to free up space on disc after removing entries from IgniteCache with enabled PDS?

Posted by Sergey Kozlov <sk...@gridgain.com>.
Hi

I'm not sure that node offline is a best way to do that.
Cons:
 - different caches may have different defragmentation but we force to stop
whole node
 - offline node is a maintenance operation will require to add +1 backup to
reduce the risk of data loss
 - baseline auto adjustment?
 - impact to index rebuild?
 - cache configuration changes (or destroy) during node offline

What about other ways without node stop? E.g. make cache group on a node
offline? Add *defrag <cache_group> *command to control.sh to force start
rebalance internally in the node with expected impact to performance.



On Thu, Oct 3, 2019 at 12:08 PM Anton Vinogradov <av...@apache.org> wrote:

> Alexey,
> As for me, it does not matter will it be IEP, umbrella or a single issue.
> The most important thing is Assignee :)
>
> On Thu, Oct 3, 2019 at 11:59 AM Alexey Goncharuk <
> alexey.goncharuk@gmail.com>
> wrote:
>
> > Anton, do you think we should file a single ticket for this or should we
> go
> > with an IEP? As of now, the change does not look big enough for an IEP
> for
> > me.
> >
> > чт, 3 окт. 2019 г. в 11:18, Anton Vinogradov <av...@apache.org>:
> >
> > > Alexey,
> > >
> > > Sounds good to me.
> > >
> > > On Thu, Oct 3, 2019 at 10:51 AM Alexey Goncharuk <
> > > alexey.goncharuk@gmail.com>
> > > wrote:
> > >
> > > > Anton,
> > > >
> > > > Switching a partition to and from the SHRINKING state will require
> > > > intricate synchronizations in order to properly determine the start
> > > > position for historical rebalance without PME.
> > > >
> > > > I would still go with an offline-node approach, but instead of
> cleaning
> > > the
> > > > persistence, we can do effective defragmentation when the node is
> > offline
> > > > because we are sure that there is no concurrent load. After the
> > > > defragmentation completes, we bring the node back to the cluster and
> > > > historical rebalance will kick in automatically. It will still
> require
> > > > manual node restarts, but since the data is not removed, there are no
> > > > additional risks. Also, this will be an excellent solution for those
> > who
> > > > can afford downtime and execute the defragment command on all nodes
> in
> > > the
> > > > cluster simultaneously - this will be the fastest way possible.
> > > >
> > > > --AG
> > > >
> > > > пн, 30 сент. 2019 г. в 09:29, Anton Vinogradov <av...@apache.org>:
> > > >
> > > > > Alexei,
> > > > > >> stopping fragmented node and removing partition data, then
> > starting
> > > it
> > > > > again
> > > > >
> > > > > That's exactly what we're doing to solve the fragmentation issue.
> > > > > The problem here is that we have to perform N/B restart-rebalance
> > > > > operations (N - cluster size, B - backups count) and it takes a lot
> > of
> > > > time
> > > > > with risks to lose the data.
> > > > >
> > > > > On Fri, Sep 27, 2019 at 5:49 PM Alexei Scherbakov <
> > > > > alexey.scherbakoff@gmail.com> wrote:
> > > > >
> > > > > > Probably this should be allowed to do using public API, actually
> > this
> > > > is
> > > > > > same as manual rebalancing.
> > > > > >
> > > > > > пт, 27 сент. 2019 г. в 17:40, Alexei Scherbakov <
> > > > > > alexey.scherbakoff@gmail.com>:
> > > > > >
> > > > > > > The poor man's solution for the problem would be stopping
> > > fragmented
> > > > > node
> > > > > > > and removing partition data, then starting it again allowing
> full
> > > > state
> > > > > > > transfer already without deletes.
> > > > > > > Rinse and repeat for all owners.
> > > > > > >
> > > > > > > Anton Vinogradov, would this work for you as workaround ?
> > > > > > >
> > > > > > > чт, 19 сент. 2019 г. в 13:03, Anton Vinogradov <av@apache.org
> >:
> > > > > > >
> > > > > > >> Alexey,
> > > > > > >>
> > > > > > >> Let's combine your and Ivan's proposals.
> > > > > > >>
> > > > > > >> >> vacuum command, which acquires exclusive table lock, so no
> > > > > concurrent
> > > > > > >> activities on the table are possible.
> > > > > > >> and
> > > > > > >> >> Could the problem be solved by stopping a node which needs
> to
> > > be
> > > > > > >> defragmented, clearing persistence files and restarting the
> > node?
> > > > > > >> >> After rebalancing the node will receive all data back
> without
> > > > > > >> fragmentation.
> > > > > > >>
> > > > > > >> How about to have special partition state SHRINKING?
> > > > > > >> This state should mean that partition unavailable for reads
> and
> > > > > updates
> > > > > > >> but
> > > > > > >> should keep it's update-counters and should not be marked as
> > lost,
> > > > > > renting
> > > > > > >> or evicted.
> > > > > > >> At this state we able to iterate over the partition and apply
> > it's
> > > > > > entries
> > > > > > >> to another file in a compact way.
> > > > > > >> Indices should be updated during the copy-on-shrink procedure
> or
> > > at
> > > > > the
> > > > > > >> shrink completion.
> > > > > > >> Once shrank file is ready we should replace the original
> > partition
> > > > > file
> > > > > > >> with it and mark it as MOVING which will start the historical
> > > > > rebalance.
> > > > > > >> Shrinking should be performed during the low activity periods,
> > but
> > > > > even
> > > > > > in
> > > > > > >> case we found that activity was high and historical rebalance
> is
> > > not
> > > > > > >> suitable we may just remove the file and use regular rebalance
> > to
> > > > > > restore
> > > > > > >> the partition (this will also lead to shrink).
> > > > > > >>
> > > > > > >> BTW, seems, we able to implement partition shrink in a cheap
> > way.
> > > > > > >> We may just use rebalancing code to apply fat partition's
> > entries
> > > to
> > > > > the
> > > > > > >> new file.
> > > > > > >> So, 3 stages here: local rebalance, indices update and global
> > > > > historical
> > > > > > >> rebalance.
> > > > > > >>
> > > > > > >> On Thu, Sep 19, 2019 at 11:43 AM Alexey Goncharuk <
> > > > > > >> alexey.goncharuk@gmail.com> wrote:
> > > > > > >>
> > > > > > >> > Anton,
> > > > > > >> >
> > > > > > >> >
> > > > > > >> > > >>  The solution which Anton suggested does not look easy
> > > > because
> > > > > it
> > > > > > >> will
> > > > > > >> > > most likely significantly hurt performance
> > > > > > >> > > Mostly agree here, but what drop do we expect? What price
> do
> > > we
> > > > > > ready
> > > > > > >> to
> > > > > > >> > > pay?
> > > > > > >> > > Not sure, but seems some vendors ready to pay, for
> example,
> > 5%
> > > > > drop
> > > > > > >> for
> > > > > > >> > > this.
> > > > > > >> >
> > > > > > >> > 5% may be a big drop for some use-cases, so I think we
> should
> > > look
> > > > > at
> > > > > > >> how
> > > > > > >> > to improve performance, not how to make it worse.
> > > > > > >> >
> > > > > > >> >
> > > > > > >> > >
> > > > > > >> > > >> it is hard to maintain a data structure to choose "page
> > > from
> > > > > > >> free-list
> > > > > > >> > > with enough space closest to the beginning of the file".
> > > > > > >> > > We can just split each free-list bucket to the couple and
> > use
> > > > > first
> > > > > > >> for
> > > > > > >> > > pages in the first half of the file and the second for the
> > > last.
> > > > > > >> > > Only two buckets required here since, during the file
> > shrink,
> > > > > first
> > > > > > >> > > bucket's window will be shrank too.
> > > > > > >> > > Seems, this give us the same price on put, just use the
> > first
> > > > > bucket
> > > > > > >> in
> > > > > > >> > > case it's not empty.
> > > > > > >> > > Remove price (with merge) will be increased, of course.
> > > > > > >> > >
> > > > > > >> > > The compromise solution is to have priority put (to the
> > first
> > > > path
> > > > > > of
> > > > > > >> the
> > > > > > >> > > file), with keeping removal as is, and schedulable
> per-page
> > > > > > migration
> > > > > > >> for
> > > > > > >> > > the rest of the data during the low activity period.
> > > > > > >> > >
> > > > > > >> > Free lists are large and slow by themselves, it is expensive
> > to
> > > > > > >> checkpoint
> > > > > > >> > and read them on start, so as a long-term solution I would
> > look
> > > > into
> > > > > > >> > removing them. Moreover, not sure if adding yet another
> > > background
> > > > > > >> process
> > > > > > >> > will improve the codebase reliability and simplicity.
> > > > > > >> >
> > > > > > >> > If we want to go the hard path, I would look at free page
> > > tracking
> > > > > > >> bitmap -
> > > > > > >> > a special bitmask page, where each page in an adjacent block
> > is
> > > > > marked
> > > > > > >> as 0
> > > > > > >> > if it has free space more than a certain configurable
> > threshold
> > > > > (say,
> > > > > > >> 80%)
> > > > > > >> > - free, and 1 if less (full). Some vendors have successfully
> > > > > > implemented
> > > > > > >> > this approach, which looks much more promising, but harder
> to
> > > > > > implement.
> > > > > > >> >
> > > > > > >> > --AG
> > > > > > >> >
> > > > > > >>
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > >
> > > > > > > Best regards,
> > > > > > > Alexei Scherbakov
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > >
> > > > > > Best regards,
> > > > > > Alexei Scherbakov
> > > > > >
> > > > >
> > > >
> > >
> >
>


-- 
Sergey Kozlov
GridGain Systems
www.gridgain.com

Re: How to free up space on disc after removing entries from IgniteCache with enabled PDS?

Posted by Anton Vinogradov <av...@apache.org>.
Alexey,
As for me, it does not matter will it be IEP, umbrella or a single issue.
The most important thing is Assignee :)

On Thu, Oct 3, 2019 at 11:59 AM Alexey Goncharuk <al...@gmail.com>
wrote:

> Anton, do you think we should file a single ticket for this or should we go
> with an IEP? As of now, the change does not look big enough for an IEP for
> me.
>
> чт, 3 окт. 2019 г. в 11:18, Anton Vinogradov <av...@apache.org>:
>
> > Alexey,
> >
> > Sounds good to me.
> >
> > On Thu, Oct 3, 2019 at 10:51 AM Alexey Goncharuk <
> > alexey.goncharuk@gmail.com>
> > wrote:
> >
> > > Anton,
> > >
> > > Switching a partition to and from the SHRINKING state will require
> > > intricate synchronizations in order to properly determine the start
> > > position for historical rebalance without PME.
> > >
> > > I would still go with an offline-node approach, but instead of cleaning
> > the
> > > persistence, we can do effective defragmentation when the node is
> offline
> > > because we are sure that there is no concurrent load. After the
> > > defragmentation completes, we bring the node back to the cluster and
> > > historical rebalance will kick in automatically. It will still require
> > > manual node restarts, but since the data is not removed, there are no
> > > additional risks. Also, this will be an excellent solution for those
> who
> > > can afford downtime and execute the defragment command on all nodes in
> > the
> > > cluster simultaneously - this will be the fastest way possible.
> > >
> > > --AG
> > >
> > > пн, 30 сент. 2019 г. в 09:29, Anton Vinogradov <av...@apache.org>:
> > >
> > > > Alexei,
> > > > >> stopping fragmented node and removing partition data, then
> starting
> > it
> > > > again
> > > >
> > > > That's exactly what we're doing to solve the fragmentation issue.
> > > > The problem here is that we have to perform N/B restart-rebalance
> > > > operations (N - cluster size, B - backups count) and it takes a lot
> of
> > > time
> > > > with risks to lose the data.
> > > >
> > > > On Fri, Sep 27, 2019 at 5:49 PM Alexei Scherbakov <
> > > > alexey.scherbakoff@gmail.com> wrote:
> > > >
> > > > > Probably this should be allowed to do using public API, actually
> this
> > > is
> > > > > same as manual rebalancing.
> > > > >
> > > > > пт, 27 сент. 2019 г. в 17:40, Alexei Scherbakov <
> > > > > alexey.scherbakoff@gmail.com>:
> > > > >
> > > > > > The poor man's solution for the problem would be stopping
> > fragmented
> > > > node
> > > > > > and removing partition data, then starting it again allowing full
> > > state
> > > > > > transfer already without deletes.
> > > > > > Rinse and repeat for all owners.
> > > > > >
> > > > > > Anton Vinogradov, would this work for you as workaround ?
> > > > > >
> > > > > > чт, 19 сент. 2019 г. в 13:03, Anton Vinogradov <av...@apache.org>:
> > > > > >
> > > > > >> Alexey,
> > > > > >>
> > > > > >> Let's combine your and Ivan's proposals.
> > > > > >>
> > > > > >> >> vacuum command, which acquires exclusive table lock, so no
> > > > concurrent
> > > > > >> activities on the table are possible.
> > > > > >> and
> > > > > >> >> Could the problem be solved by stopping a node which needs to
> > be
> > > > > >> defragmented, clearing persistence files and restarting the
> node?
> > > > > >> >> After rebalancing the node will receive all data back without
> > > > > >> fragmentation.
> > > > > >>
> > > > > >> How about to have special partition state SHRINKING?
> > > > > >> This state should mean that partition unavailable for reads and
> > > > updates
> > > > > >> but
> > > > > >> should keep it's update-counters and should not be marked as
> lost,
> > > > > renting
> > > > > >> or evicted.
> > > > > >> At this state we able to iterate over the partition and apply
> it's
> > > > > entries
> > > > > >> to another file in a compact way.
> > > > > >> Indices should be updated during the copy-on-shrink procedure or
> > at
> > > > the
> > > > > >> shrink completion.
> > > > > >> Once shrank file is ready we should replace the original
> partition
> > > > file
> > > > > >> with it and mark it as MOVING which will start the historical
> > > > rebalance.
> > > > > >> Shrinking should be performed during the low activity periods,
> but
> > > > even
> > > > > in
> > > > > >> case we found that activity was high and historical rebalance is
> > not
> > > > > >> suitable we may just remove the file and use regular rebalance
> to
> > > > > restore
> > > > > >> the partition (this will also lead to shrink).
> > > > > >>
> > > > > >> BTW, seems, we able to implement partition shrink in a cheap
> way.
> > > > > >> We may just use rebalancing code to apply fat partition's
> entries
> > to
> > > > the
> > > > > >> new file.
> > > > > >> So, 3 stages here: local rebalance, indices update and global
> > > > historical
> > > > > >> rebalance.
> > > > > >>
> > > > > >> On Thu, Sep 19, 2019 at 11:43 AM Alexey Goncharuk <
> > > > > >> alexey.goncharuk@gmail.com> wrote:
> > > > > >>
> > > > > >> > Anton,
> > > > > >> >
> > > > > >> >
> > > > > >> > > >>  The solution which Anton suggested does not look easy
> > > because
> > > > it
> > > > > >> will
> > > > > >> > > most likely significantly hurt performance
> > > > > >> > > Mostly agree here, but what drop do we expect? What price do
> > we
> > > > > ready
> > > > > >> to
> > > > > >> > > pay?
> > > > > >> > > Not sure, but seems some vendors ready to pay, for example,
> 5%
> > > > drop
> > > > > >> for
> > > > > >> > > this.
> > > > > >> >
> > > > > >> > 5% may be a big drop for some use-cases, so I think we should
> > look
> > > > at
> > > > > >> how
> > > > > >> > to improve performance, not how to make it worse.
> > > > > >> >
> > > > > >> >
> > > > > >> > >
> > > > > >> > > >> it is hard to maintain a data structure to choose "page
> > from
> > > > > >> free-list
> > > > > >> > > with enough space closest to the beginning of the file".
> > > > > >> > > We can just split each free-list bucket to the couple and
> use
> > > > first
> > > > > >> for
> > > > > >> > > pages in the first half of the file and the second for the
> > last.
> > > > > >> > > Only two buckets required here since, during the file
> shrink,
> > > > first
> > > > > >> > > bucket's window will be shrank too.
> > > > > >> > > Seems, this give us the same price on put, just use the
> first
> > > > bucket
> > > > > >> in
> > > > > >> > > case it's not empty.
> > > > > >> > > Remove price (with merge) will be increased, of course.
> > > > > >> > >
> > > > > >> > > The compromise solution is to have priority put (to the
> first
> > > path
> > > > > of
> > > > > >> the
> > > > > >> > > file), with keeping removal as is, and schedulable per-page
> > > > > migration
> > > > > >> for
> > > > > >> > > the rest of the data during the low activity period.
> > > > > >> > >
> > > > > >> > Free lists are large and slow by themselves, it is expensive
> to
> > > > > >> checkpoint
> > > > > >> > and read them on start, so as a long-term solution I would
> look
> > > into
> > > > > >> > removing them. Moreover, not sure if adding yet another
> > background
> > > > > >> process
> > > > > >> > will improve the codebase reliability and simplicity.
> > > > > >> >
> > > > > >> > If we want to go the hard path, I would look at free page
> > tracking
> > > > > >> bitmap -
> > > > > >> > a special bitmask page, where each page in an adjacent block
> is
> > > > marked
> > > > > >> as 0
> > > > > >> > if it has free space more than a certain configurable
> threshold
> > > > (say,
> > > > > >> 80%)
> > > > > >> > - free, and 1 if less (full). Some vendors have successfully
> > > > > implemented
> > > > > >> > this approach, which looks much more promising, but harder to
> > > > > implement.
> > > > > >> >
> > > > > >> > --AG
> > > > > >> >
> > > > > >>
> > > > > >
> > > > > >
> > > > > > --
> > > > > >
> > > > > > Best regards,
> > > > > > Alexei Scherbakov
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > Best regards,
> > > > > Alexei Scherbakov
> > > > >
> > > >
> > >
> >
>

Re: How to free up space on disc after removing entries from IgniteCache with enabled PDS?

Posted by Alexey Goncharuk <al...@gmail.com>.
Anton, do you think we should file a single ticket for this or should we go
with an IEP? As of now, the change does not look big enough for an IEP for
me.

чт, 3 окт. 2019 г. в 11:18, Anton Vinogradov <av...@apache.org>:

> Alexey,
>
> Sounds good to me.
>
> On Thu, Oct 3, 2019 at 10:51 AM Alexey Goncharuk <
> alexey.goncharuk@gmail.com>
> wrote:
>
> > Anton,
> >
> > Switching a partition to and from the SHRINKING state will require
> > intricate synchronizations in order to properly determine the start
> > position for historical rebalance without PME.
> >
> > I would still go with an offline-node approach, but instead of cleaning
> the
> > persistence, we can do effective defragmentation when the node is offline
> > because we are sure that there is no concurrent load. After the
> > defragmentation completes, we bring the node back to the cluster and
> > historical rebalance will kick in automatically. It will still require
> > manual node restarts, but since the data is not removed, there are no
> > additional risks. Also, this will be an excellent solution for those who
> > can afford downtime and execute the defragment command on all nodes in
> the
> > cluster simultaneously - this will be the fastest way possible.
> >
> > --AG
> >
> > пн, 30 сент. 2019 г. в 09:29, Anton Vinogradov <av...@apache.org>:
> >
> > > Alexei,
> > > >> stopping fragmented node and removing partition data, then starting
> it
> > > again
> > >
> > > That's exactly what we're doing to solve the fragmentation issue.
> > > The problem here is that we have to perform N/B restart-rebalance
> > > operations (N - cluster size, B - backups count) and it takes a lot of
> > time
> > > with risks to lose the data.
> > >
> > > On Fri, Sep 27, 2019 at 5:49 PM Alexei Scherbakov <
> > > alexey.scherbakoff@gmail.com> wrote:
> > >
> > > > Probably this should be allowed to do using public API, actually this
> > is
> > > > same as manual rebalancing.
> > > >
> > > > пт, 27 сент. 2019 г. в 17:40, Alexei Scherbakov <
> > > > alexey.scherbakoff@gmail.com>:
> > > >
> > > > > The poor man's solution for the problem would be stopping
> fragmented
> > > node
> > > > > and removing partition data, then starting it again allowing full
> > state
> > > > > transfer already without deletes.
> > > > > Rinse and repeat for all owners.
> > > > >
> > > > > Anton Vinogradov, would this work for you as workaround ?
> > > > >
> > > > > чт, 19 сент. 2019 г. в 13:03, Anton Vinogradov <av...@apache.org>:
> > > > >
> > > > >> Alexey,
> > > > >>
> > > > >> Let's combine your and Ivan's proposals.
> > > > >>
> > > > >> >> vacuum command, which acquires exclusive table lock, so no
> > > concurrent
> > > > >> activities on the table are possible.
> > > > >> and
> > > > >> >> Could the problem be solved by stopping a node which needs to
> be
> > > > >> defragmented, clearing persistence files and restarting the node?
> > > > >> >> After rebalancing the node will receive all data back without
> > > > >> fragmentation.
> > > > >>
> > > > >> How about to have special partition state SHRINKING?
> > > > >> This state should mean that partition unavailable for reads and
> > > updates
> > > > >> but
> > > > >> should keep it's update-counters and should not be marked as lost,
> > > > renting
> > > > >> or evicted.
> > > > >> At this state we able to iterate over the partition and apply it's
> > > > entries
> > > > >> to another file in a compact way.
> > > > >> Indices should be updated during the copy-on-shrink procedure or
> at
> > > the
> > > > >> shrink completion.
> > > > >> Once shrank file is ready we should replace the original partition
> > > file
> > > > >> with it and mark it as MOVING which will start the historical
> > > rebalance.
> > > > >> Shrinking should be performed during the low activity periods, but
> > > even
> > > > in
> > > > >> case we found that activity was high and historical rebalance is
> not
> > > > >> suitable we may just remove the file and use regular rebalance to
> > > > restore
> > > > >> the partition (this will also lead to shrink).
> > > > >>
> > > > >> BTW, seems, we able to implement partition shrink in a cheap way.
> > > > >> We may just use rebalancing code to apply fat partition's entries
> to
> > > the
> > > > >> new file.
> > > > >> So, 3 stages here: local rebalance, indices update and global
> > > historical
> > > > >> rebalance.
> > > > >>
> > > > >> On Thu, Sep 19, 2019 at 11:43 AM Alexey Goncharuk <
> > > > >> alexey.goncharuk@gmail.com> wrote:
> > > > >>
> > > > >> > Anton,
> > > > >> >
> > > > >> >
> > > > >> > > >>  The solution which Anton suggested does not look easy
> > because
> > > it
> > > > >> will
> > > > >> > > most likely significantly hurt performance
> > > > >> > > Mostly agree here, but what drop do we expect? What price do
> we
> > > > ready
> > > > >> to
> > > > >> > > pay?
> > > > >> > > Not sure, but seems some vendors ready to pay, for example, 5%
> > > drop
> > > > >> for
> > > > >> > > this.
> > > > >> >
> > > > >> > 5% may be a big drop for some use-cases, so I think we should
> look
> > > at
> > > > >> how
> > > > >> > to improve performance, not how to make it worse.
> > > > >> >
> > > > >> >
> > > > >> > >
> > > > >> > > >> it is hard to maintain a data structure to choose "page
> from
> > > > >> free-list
> > > > >> > > with enough space closest to the beginning of the file".
> > > > >> > > We can just split each free-list bucket to the couple and use
> > > first
> > > > >> for
> > > > >> > > pages in the first half of the file and the second for the
> last.
> > > > >> > > Only two buckets required here since, during the file shrink,
> > > first
> > > > >> > > bucket's window will be shrank too.
> > > > >> > > Seems, this give us the same price on put, just use the first
> > > bucket
> > > > >> in
> > > > >> > > case it's not empty.
> > > > >> > > Remove price (with merge) will be increased, of course.
> > > > >> > >
> > > > >> > > The compromise solution is to have priority put (to the first
> > path
> > > > of
> > > > >> the
> > > > >> > > file), with keeping removal as is, and schedulable per-page
> > > > migration
> > > > >> for
> > > > >> > > the rest of the data during the low activity period.
> > > > >> > >
> > > > >> > Free lists are large and slow by themselves, it is expensive to
> > > > >> checkpoint
> > > > >> > and read them on start, so as a long-term solution I would look
> > into
> > > > >> > removing them. Moreover, not sure if adding yet another
> background
> > > > >> process
> > > > >> > will improve the codebase reliability and simplicity.
> > > > >> >
> > > > >> > If we want to go the hard path, I would look at free page
> tracking
> > > > >> bitmap -
> > > > >> > a special bitmask page, where each page in an adjacent block is
> > > marked
> > > > >> as 0
> > > > >> > if it has free space more than a certain configurable threshold
> > > (say,
> > > > >> 80%)
> > > > >> > - free, and 1 if less (full). Some vendors have successfully
> > > > implemented
> > > > >> > this approach, which looks much more promising, but harder to
> > > > implement.
> > > > >> >
> > > > >> > --AG
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > Best regards,
> > > > > Alexei Scherbakov
> > > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Best regards,
> > > > Alexei Scherbakov
> > > >
> > >
> >
>

Re: How to free up space on disc after removing entries from IgniteCache with enabled PDS?

Posted by Anton Vinogradov <av...@apache.org>.
Alexey,

Sounds good to me.

On Thu, Oct 3, 2019 at 10:51 AM Alexey Goncharuk <al...@gmail.com>
wrote:

> Anton,
>
> Switching a partition to and from the SHRINKING state will require
> intricate synchronizations in order to properly determine the start
> position for historical rebalance without PME.
>
> I would still go with an offline-node approach, but instead of cleaning the
> persistence, we can do effective defragmentation when the node is offline
> because we are sure that there is no concurrent load. After the
> defragmentation completes, we bring the node back to the cluster and
> historical rebalance will kick in automatically. It will still require
> manual node restarts, but since the data is not removed, there are no
> additional risks. Also, this will be an excellent solution for those who
> can afford downtime and execute the defragment command on all nodes in the
> cluster simultaneously - this will be the fastest way possible.
>
> --AG
>
> пн, 30 сент. 2019 г. в 09:29, Anton Vinogradov <av...@apache.org>:
>
> > Alexei,
> > >> stopping fragmented node and removing partition data, then starting it
> > again
> >
> > That's exactly what we're doing to solve the fragmentation issue.
> > The problem here is that we have to perform N/B restart-rebalance
> > operations (N - cluster size, B - backups count) and it takes a lot of
> time
> > with risks to lose the data.
> >
> > On Fri, Sep 27, 2019 at 5:49 PM Alexei Scherbakov <
> > alexey.scherbakoff@gmail.com> wrote:
> >
> > > Probably this should be allowed to do using public API, actually this
> is
> > > same as manual rebalancing.
> > >
> > > пт, 27 сент. 2019 г. в 17:40, Alexei Scherbakov <
> > > alexey.scherbakoff@gmail.com>:
> > >
> > > > The poor man's solution for the problem would be stopping fragmented
> > node
> > > > and removing partition data, then starting it again allowing full
> state
> > > > transfer already without deletes.
> > > > Rinse and repeat for all owners.
> > > >
> > > > Anton Vinogradov, would this work for you as workaround ?
> > > >
> > > > чт, 19 сент. 2019 г. в 13:03, Anton Vinogradov <av...@apache.org>:
> > > >
> > > >> Alexey,
> > > >>
> > > >> Let's combine your and Ivan's proposals.
> > > >>
> > > >> >> vacuum command, which acquires exclusive table lock, so no
> > concurrent
> > > >> activities on the table are possible.
> > > >> and
> > > >> >> Could the problem be solved by stopping a node which needs to be
> > > >> defragmented, clearing persistence files and restarting the node?
> > > >> >> After rebalancing the node will receive all data back without
> > > >> fragmentation.
> > > >>
> > > >> How about to have special partition state SHRINKING?
> > > >> This state should mean that partition unavailable for reads and
> > updates
> > > >> but
> > > >> should keep it's update-counters and should not be marked as lost,
> > > renting
> > > >> or evicted.
> > > >> At this state we able to iterate over the partition and apply it's
> > > entries
> > > >> to another file in a compact way.
> > > >> Indices should be updated during the copy-on-shrink procedure or at
> > the
> > > >> shrink completion.
> > > >> Once shrank file is ready we should replace the original partition
> > file
> > > >> with it and mark it as MOVING which will start the historical
> > rebalance.
> > > >> Shrinking should be performed during the low activity periods, but
> > even
> > > in
> > > >> case we found that activity was high and historical rebalance is not
> > > >> suitable we may just remove the file and use regular rebalance to
> > > restore
> > > >> the partition (this will also lead to shrink).
> > > >>
> > > >> BTW, seems, we able to implement partition shrink in a cheap way.
> > > >> We may just use rebalancing code to apply fat partition's entries to
> > the
> > > >> new file.
> > > >> So, 3 stages here: local rebalance, indices update and global
> > historical
> > > >> rebalance.
> > > >>
> > > >> On Thu, Sep 19, 2019 at 11:43 AM Alexey Goncharuk <
> > > >> alexey.goncharuk@gmail.com> wrote:
> > > >>
> > > >> > Anton,
> > > >> >
> > > >> >
> > > >> > > >>  The solution which Anton suggested does not look easy
> because
> > it
> > > >> will
> > > >> > > most likely significantly hurt performance
> > > >> > > Mostly agree here, but what drop do we expect? What price do we
> > > ready
> > > >> to
> > > >> > > pay?
> > > >> > > Not sure, but seems some vendors ready to pay, for example, 5%
> > drop
> > > >> for
> > > >> > > this.
> > > >> >
> > > >> > 5% may be a big drop for some use-cases, so I think we should look
> > at
> > > >> how
> > > >> > to improve performance, not how to make it worse.
> > > >> >
> > > >> >
> > > >> > >
> > > >> > > >> it is hard to maintain a data structure to choose "page from
> > > >> free-list
> > > >> > > with enough space closest to the beginning of the file".
> > > >> > > We can just split each free-list bucket to the couple and use
> > first
> > > >> for
> > > >> > > pages in the first half of the file and the second for the last.
> > > >> > > Only two buckets required here since, during the file shrink,
> > first
> > > >> > > bucket's window will be shrank too.
> > > >> > > Seems, this give us the same price on put, just use the first
> > bucket
> > > >> in
> > > >> > > case it's not empty.
> > > >> > > Remove price (with merge) will be increased, of course.
> > > >> > >
> > > >> > > The compromise solution is to have priority put (to the first
> path
> > > of
> > > >> the
> > > >> > > file), with keeping removal as is, and schedulable per-page
> > > migration
> > > >> for
> > > >> > > the rest of the data during the low activity period.
> > > >> > >
> > > >> > Free lists are large and slow by themselves, it is expensive to
> > > >> checkpoint
> > > >> > and read them on start, so as a long-term solution I would look
> into
> > > >> > removing them. Moreover, not sure if adding yet another background
> > > >> process
> > > >> > will improve the codebase reliability and simplicity.
> > > >> >
> > > >> > If we want to go the hard path, I would look at free page tracking
> > > >> bitmap -
> > > >> > a special bitmask page, where each page in an adjacent block is
> > marked
> > > >> as 0
> > > >> > if it has free space more than a certain configurable threshold
> > (say,
> > > >> 80%)
> > > >> > - free, and 1 if less (full). Some vendors have successfully
> > > implemented
> > > >> > this approach, which looks much more promising, but harder to
> > > implement.
> > > >> >
> > > >> > --AG
> > > >> >
> > > >>
> > > >
> > > >
> > > > --
> > > >
> > > > Best regards,
> > > > Alexei Scherbakov
> > > >
> > >
> > >
> > > --
> > >
> > > Best regards,
> > > Alexei Scherbakov
> > >
> >
>

Re: How to free up space on disc after removing entries from IgniteCache with enabled PDS?

Posted by Alexey Goncharuk <al...@gmail.com>.
Anton,

Switching a partition to and from the SHRINKING state will require
intricate synchronizations in order to properly determine the start
position for historical rebalance without PME.

I would still go with an offline-node approach, but instead of cleaning the
persistence, we can do effective defragmentation when the node is offline
because we are sure that there is no concurrent load. After the
defragmentation completes, we bring the node back to the cluster and
historical rebalance will kick in automatically. It will still require
manual node restarts, but since the data is not removed, there are no
additional risks. Also, this will be an excellent solution for those who
can afford downtime and execute the defragment command on all nodes in the
cluster simultaneously - this will be the fastest way possible.

--AG

пн, 30 сент. 2019 г. в 09:29, Anton Vinogradov <av...@apache.org>:

> Alexei,
> >> stopping fragmented node and removing partition data, then starting it
> again
>
> That's exactly what we're doing to solve the fragmentation issue.
> The problem here is that we have to perform N/B restart-rebalance
> operations (N - cluster size, B - backups count) and it takes a lot of time
> with risks to lose the data.
>
> On Fri, Sep 27, 2019 at 5:49 PM Alexei Scherbakov <
> alexey.scherbakoff@gmail.com> wrote:
>
> > Probably this should be allowed to do using public API, actually this is
> > same as manual rebalancing.
> >
> > пт, 27 сент. 2019 г. в 17:40, Alexei Scherbakov <
> > alexey.scherbakoff@gmail.com>:
> >
> > > The poor man's solution for the problem would be stopping fragmented
> node
> > > and removing partition data, then starting it again allowing full state
> > > transfer already without deletes.
> > > Rinse and repeat for all owners.
> > >
> > > Anton Vinogradov, would this work for you as workaround ?
> > >
> > > чт, 19 сент. 2019 г. в 13:03, Anton Vinogradov <av...@apache.org>:
> > >
> > >> Alexey,
> > >>
> > >> Let's combine your and Ivan's proposals.
> > >>
> > >> >> vacuum command, which acquires exclusive table lock, so no
> concurrent
> > >> activities on the table are possible.
> > >> and
> > >> >> Could the problem be solved by stopping a node which needs to be
> > >> defragmented, clearing persistence files and restarting the node?
> > >> >> After rebalancing the node will receive all data back without
> > >> fragmentation.
> > >>
> > >> How about to have special partition state SHRINKING?
> > >> This state should mean that partition unavailable for reads and
> updates
> > >> but
> > >> should keep it's update-counters and should not be marked as lost,
> > renting
> > >> or evicted.
> > >> At this state we able to iterate over the partition and apply it's
> > entries
> > >> to another file in a compact way.
> > >> Indices should be updated during the copy-on-shrink procedure or at
> the
> > >> shrink completion.
> > >> Once shrank file is ready we should replace the original partition
> file
> > >> with it and mark it as MOVING which will start the historical
> rebalance.
> > >> Shrinking should be performed during the low activity periods, but
> even
> > in
> > >> case we found that activity was high and historical rebalance is not
> > >> suitable we may just remove the file and use regular rebalance to
> > restore
> > >> the partition (this will also lead to shrink).
> > >>
> > >> BTW, seems, we able to implement partition shrink in a cheap way.
> > >> We may just use rebalancing code to apply fat partition's entries to
> the
> > >> new file.
> > >> So, 3 stages here: local rebalance, indices update and global
> historical
> > >> rebalance.
> > >>
> > >> On Thu, Sep 19, 2019 at 11:43 AM Alexey Goncharuk <
> > >> alexey.goncharuk@gmail.com> wrote:
> > >>
> > >> > Anton,
> > >> >
> > >> >
> > >> > > >>  The solution which Anton suggested does not look easy because
> it
> > >> will
> > >> > > most likely significantly hurt performance
> > >> > > Mostly agree here, but what drop do we expect? What price do we
> > ready
> > >> to
> > >> > > pay?
> > >> > > Not sure, but seems some vendors ready to pay, for example, 5%
> drop
> > >> for
> > >> > > this.
> > >> >
> > >> > 5% may be a big drop for some use-cases, so I think we should look
> at
> > >> how
> > >> > to improve performance, not how to make it worse.
> > >> >
> > >> >
> > >> > >
> > >> > > >> it is hard to maintain a data structure to choose "page from
> > >> free-list
> > >> > > with enough space closest to the beginning of the file".
> > >> > > We can just split each free-list bucket to the couple and use
> first
> > >> for
> > >> > > pages in the first half of the file and the second for the last.
> > >> > > Only two buckets required here since, during the file shrink,
> first
> > >> > > bucket's window will be shrank too.
> > >> > > Seems, this give us the same price on put, just use the first
> bucket
> > >> in
> > >> > > case it's not empty.
> > >> > > Remove price (with merge) will be increased, of course.
> > >> > >
> > >> > > The compromise solution is to have priority put (to the first path
> > of
> > >> the
> > >> > > file), with keeping removal as is, and schedulable per-page
> > migration
> > >> for
> > >> > > the rest of the data during the low activity period.
> > >> > >
> > >> > Free lists are large and slow by themselves, it is expensive to
> > >> checkpoint
> > >> > and read them on start, so as a long-term solution I would look into
> > >> > removing them. Moreover, not sure if adding yet another background
> > >> process
> > >> > will improve the codebase reliability and simplicity.
> > >> >
> > >> > If we want to go the hard path, I would look at free page tracking
> > >> bitmap -
> > >> > a special bitmask page, where each page in an adjacent block is
> marked
> > >> as 0
> > >> > if it has free space more than a certain configurable threshold
> (say,
> > >> 80%)
> > >> > - free, and 1 if less (full). Some vendors have successfully
> > implemented
> > >> > this approach, which looks much more promising, but harder to
> > implement.
> > >> >
> > >> > --AG
> > >> >
> > >>
> > >
> > >
> > > --
> > >
> > > Best regards,
> > > Alexei Scherbakov
> > >
> >
> >
> > --
> >
> > Best regards,
> > Alexei Scherbakov
> >
>

Re: How to free up space on disc after removing entries from IgniteCache with enabled PDS?

Posted by Anton Vinogradov <av...@apache.org>.
Alexei,
>> stopping fragmented node and removing partition data, then starting it
again

That's exactly what we're doing to solve the fragmentation issue.
The problem here is that we have to perform N/B restart-rebalance
operations (N - cluster size, B - backups count) and it takes a lot of time
with risks to lose the data.

On Fri, Sep 27, 2019 at 5:49 PM Alexei Scherbakov <
alexey.scherbakoff@gmail.com> wrote:

> Probably this should be allowed to do using public API, actually this is
> same as manual rebalancing.
>
> пт, 27 сент. 2019 г. в 17:40, Alexei Scherbakov <
> alexey.scherbakoff@gmail.com>:
>
> > The poor man's solution for the problem would be stopping fragmented node
> > and removing partition data, then starting it again allowing full state
> > transfer already without deletes.
> > Rinse and repeat for all owners.
> >
> > Anton Vinogradov, would this work for you as workaround ?
> >
> > чт, 19 сент. 2019 г. в 13:03, Anton Vinogradov <av...@apache.org>:
> >
> >> Alexey,
> >>
> >> Let's combine your and Ivan's proposals.
> >>
> >> >> vacuum command, which acquires exclusive table lock, so no concurrent
> >> activities on the table are possible.
> >> and
> >> >> Could the problem be solved by stopping a node which needs to be
> >> defragmented, clearing persistence files and restarting the node?
> >> >> After rebalancing the node will receive all data back without
> >> fragmentation.
> >>
> >> How about to have special partition state SHRINKING?
> >> This state should mean that partition unavailable for reads and updates
> >> but
> >> should keep it's update-counters and should not be marked as lost,
> renting
> >> or evicted.
> >> At this state we able to iterate over the partition and apply it's
> entries
> >> to another file in a compact way.
> >> Indices should be updated during the copy-on-shrink procedure or at the
> >> shrink completion.
> >> Once shrank file is ready we should replace the original partition file
> >> with it and mark it as MOVING which will start the historical rebalance.
> >> Shrinking should be performed during the low activity periods, but even
> in
> >> case we found that activity was high and historical rebalance is not
> >> suitable we may just remove the file and use regular rebalance to
> restore
> >> the partition (this will also lead to shrink).
> >>
> >> BTW, seems, we able to implement partition shrink in a cheap way.
> >> We may just use rebalancing code to apply fat partition's entries to the
> >> new file.
> >> So, 3 stages here: local rebalance, indices update and global historical
> >> rebalance.
> >>
> >> On Thu, Sep 19, 2019 at 11:43 AM Alexey Goncharuk <
> >> alexey.goncharuk@gmail.com> wrote:
> >>
> >> > Anton,
> >> >
> >> >
> >> > > >>  The solution which Anton suggested does not look easy because it
> >> will
> >> > > most likely significantly hurt performance
> >> > > Mostly agree here, but what drop do we expect? What price do we
> ready
> >> to
> >> > > pay?
> >> > > Not sure, but seems some vendors ready to pay, for example, 5% drop
> >> for
> >> > > this.
> >> >
> >> > 5% may be a big drop for some use-cases, so I think we should look at
> >> how
> >> > to improve performance, not how to make it worse.
> >> >
> >> >
> >> > >
> >> > > >> it is hard to maintain a data structure to choose "page from
> >> free-list
> >> > > with enough space closest to the beginning of the file".
> >> > > We can just split each free-list bucket to the couple and use first
> >> for
> >> > > pages in the first half of the file and the second for the last.
> >> > > Only two buckets required here since, during the file shrink, first
> >> > > bucket's window will be shrank too.
> >> > > Seems, this give us the same price on put, just use the first bucket
> >> in
> >> > > case it's not empty.
> >> > > Remove price (with merge) will be increased, of course.
> >> > >
> >> > > The compromise solution is to have priority put (to the first path
> of
> >> the
> >> > > file), with keeping removal as is, and schedulable per-page
> migration
> >> for
> >> > > the rest of the data during the low activity period.
> >> > >
> >> > Free lists are large and slow by themselves, it is expensive to
> >> checkpoint
> >> > and read them on start, so as a long-term solution I would look into
> >> > removing them. Moreover, not sure if adding yet another background
> >> process
> >> > will improve the codebase reliability and simplicity.
> >> >
> >> > If we want to go the hard path, I would look at free page tracking
> >> bitmap -
> >> > a special bitmask page, where each page in an adjacent block is marked
> >> as 0
> >> > if it has free space more than a certain configurable threshold (say,
> >> 80%)
> >> > - free, and 1 if less (full). Some vendors have successfully
> implemented
> >> > this approach, which looks much more promising, but harder to
> implement.
> >> >
> >> > --AG
> >> >
> >>
> >
> >
> > --
> >
> > Best regards,
> > Alexei Scherbakov
> >
>
>
> --
>
> Best regards,
> Alexei Scherbakov
>

Re: How to free up space on disc after removing entries from IgniteCache with enabled PDS?

Posted by Alexei Scherbakov <al...@gmail.com>.
Probably this should be allowed to do using public API, actually this is
same as manual rebalancing.

пт, 27 сент. 2019 г. в 17:40, Alexei Scherbakov <
alexey.scherbakoff@gmail.com>:

> The poor man's solution for the problem would be stopping fragmented node
> and removing partition data, then starting it again allowing full state
> transfer already without deletes.
> Rinse and repeat for all owners.
>
> Anton Vinogradov, would this work for you as workaround ?
>
> чт, 19 сент. 2019 г. в 13:03, Anton Vinogradov <av...@apache.org>:
>
>> Alexey,
>>
>> Let's combine your and Ivan's proposals.
>>
>> >> vacuum command, which acquires exclusive table lock, so no concurrent
>> activities on the table are possible.
>> and
>> >> Could the problem be solved by stopping a node which needs to be
>> defragmented, clearing persistence files and restarting the node?
>> >> After rebalancing the node will receive all data back without
>> fragmentation.
>>
>> How about to have special partition state SHRINKING?
>> This state should mean that partition unavailable for reads and updates
>> but
>> should keep it's update-counters and should not be marked as lost, renting
>> or evicted.
>> At this state we able to iterate over the partition and apply it's entries
>> to another file in a compact way.
>> Indices should be updated during the copy-on-shrink procedure or at the
>> shrink completion.
>> Once shrank file is ready we should replace the original partition file
>> with it and mark it as MOVING which will start the historical rebalance.
>> Shrinking should be performed during the low activity periods, but even in
>> case we found that activity was high and historical rebalance is not
>> suitable we may just remove the file and use regular rebalance to restore
>> the partition (this will also lead to shrink).
>>
>> BTW, seems, we able to implement partition shrink in a cheap way.
>> We may just use rebalancing code to apply fat partition's entries to the
>> new file.
>> So, 3 stages here: local rebalance, indices update and global historical
>> rebalance.
>>
>> On Thu, Sep 19, 2019 at 11:43 AM Alexey Goncharuk <
>> alexey.goncharuk@gmail.com> wrote:
>>
>> > Anton,
>> >
>> >
>> > > >>  The solution which Anton suggested does not look easy because it
>> will
>> > > most likely significantly hurt performance
>> > > Mostly agree here, but what drop do we expect? What price do we ready
>> to
>> > > pay?
>> > > Not sure, but seems some vendors ready to pay, for example, 5% drop
>> for
>> > > this.
>> >
>> > 5% may be a big drop for some use-cases, so I think we should look at
>> how
>> > to improve performance, not how to make it worse.
>> >
>> >
>> > >
>> > > >> it is hard to maintain a data structure to choose "page from
>> free-list
>> > > with enough space closest to the beginning of the file".
>> > > We can just split each free-list bucket to the couple and use first
>> for
>> > > pages in the first half of the file and the second for the last.
>> > > Only two buckets required here since, during the file shrink, first
>> > > bucket's window will be shrank too.
>> > > Seems, this give us the same price on put, just use the first bucket
>> in
>> > > case it's not empty.
>> > > Remove price (with merge) will be increased, of course.
>> > >
>> > > The compromise solution is to have priority put (to the first path of
>> the
>> > > file), with keeping removal as is, and schedulable per-page migration
>> for
>> > > the rest of the data during the low activity period.
>> > >
>> > Free lists are large and slow by themselves, it is expensive to
>> checkpoint
>> > and read them on start, so as a long-term solution I would look into
>> > removing them. Moreover, not sure if adding yet another background
>> process
>> > will improve the codebase reliability and simplicity.
>> >
>> > If we want to go the hard path, I would look at free page tracking
>> bitmap -
>> > a special bitmask page, where each page in an adjacent block is marked
>> as 0
>> > if it has free space more than a certain configurable threshold (say,
>> 80%)
>> > - free, and 1 if less (full). Some vendors have successfully implemented
>> > this approach, which looks much more promising, but harder to implement.
>> >
>> > --AG
>> >
>>
>
>
> --
>
> Best regards,
> Alexei Scherbakov
>


-- 

Best regards,
Alexei Scherbakov

Re: How to free up space on disc after removing entries from IgniteCache with enabled PDS?

Posted by Alexei Scherbakov <al...@gmail.com>.
The poor man's solution for the problem would be stopping fragmented node
and removing partition data, then starting it again allowing full state
transfer already without deletes.
Rinse and repeat for all owners.

Anton Vinogradov, would this work for you as workaround ?

чт, 19 сент. 2019 г. в 13:03, Anton Vinogradov <av...@apache.org>:

> Alexey,
>
> Let's combine your and Ivan's proposals.
>
> >> vacuum command, which acquires exclusive table lock, so no concurrent
> activities on the table are possible.
> and
> >> Could the problem be solved by stopping a node which needs to be
> defragmented, clearing persistence files and restarting the node?
> >> After rebalancing the node will receive all data back without
> fragmentation.
>
> How about to have special partition state SHRINKING?
> This state should mean that partition unavailable for reads and updates but
> should keep it's update-counters and should not be marked as lost, renting
> or evicted.
> At this state we able to iterate over the partition and apply it's entries
> to another file in a compact way.
> Indices should be updated during the copy-on-shrink procedure or at the
> shrink completion.
> Once shrank file is ready we should replace the original partition file
> with it and mark it as MOVING which will start the historical rebalance.
> Shrinking should be performed during the low activity periods, but even in
> case we found that activity was high and historical rebalance is not
> suitable we may just remove the file and use regular rebalance to restore
> the partition (this will also lead to shrink).
>
> BTW, seems, we able to implement partition shrink in a cheap way.
> We may just use rebalancing code to apply fat partition's entries to the
> new file.
> So, 3 stages here: local rebalance, indices update and global historical
> rebalance.
>
> On Thu, Sep 19, 2019 at 11:43 AM Alexey Goncharuk <
> alexey.goncharuk@gmail.com> wrote:
>
> > Anton,
> >
> >
> > > >>  The solution which Anton suggested does not look easy because it
> will
> > > most likely significantly hurt performance
> > > Mostly agree here, but what drop do we expect? What price do we ready
> to
> > > pay?
> > > Not sure, but seems some vendors ready to pay, for example, 5% drop for
> > > this.
> >
> > 5% may be a big drop for some use-cases, so I think we should look at how
> > to improve performance, not how to make it worse.
> >
> >
> > >
> > > >> it is hard to maintain a data structure to choose "page from
> free-list
> > > with enough space closest to the beginning of the file".
> > > We can just split each free-list bucket to the couple and use first for
> > > pages in the first half of the file and the second for the last.
> > > Only two buckets required here since, during the file shrink, first
> > > bucket's window will be shrank too.
> > > Seems, this give us the same price on put, just use the first bucket in
> > > case it's not empty.
> > > Remove price (with merge) will be increased, of course.
> > >
> > > The compromise solution is to have priority put (to the first path of
> the
> > > file), with keeping removal as is, and schedulable per-page migration
> for
> > > the rest of the data during the low activity period.
> > >
> > Free lists are large and slow by themselves, it is expensive to
> checkpoint
> > and read them on start, so as a long-term solution I would look into
> > removing them. Moreover, not sure if adding yet another background
> process
> > will improve the codebase reliability and simplicity.
> >
> > If we want to go the hard path, I would look at free page tracking
> bitmap -
> > a special bitmask page, where each page in an adjacent block is marked
> as 0
> > if it has free space more than a certain configurable threshold (say,
> 80%)
> > - free, and 1 if less (full). Some vendors have successfully implemented
> > this approach, which looks much more promising, but harder to implement.
> >
> > --AG
> >
>


-- 

Best regards,
Alexei Scherbakov

Re: How to free up space on disc after removing entries from IgniteCache with enabled PDS?

Posted by Anton Vinogradov <av...@apache.org>.
Alexey,

Let's combine your and Ivan's proposals.

>> vacuum command, which acquires exclusive table lock, so no concurrent
activities on the table are possible.
and
>> Could the problem be solved by stopping a node which needs to be
defragmented, clearing persistence files and restarting the node?
>> After rebalancing the node will receive all data back without
fragmentation.

How about to have special partition state SHRINKING?
This state should mean that partition unavailable for reads and updates but
should keep it's update-counters and should not be marked as lost, renting
or evicted.
At this state we able to iterate over the partition and apply it's entries
to another file in a compact way.
Indices should be updated during the copy-on-shrink procedure or at the
shrink completion.
Once shrank file is ready we should replace the original partition file
with it and mark it as MOVING which will start the historical rebalance.
Shrinking should be performed during the low activity periods, but even in
case we found that activity was high and historical rebalance is not
suitable we may just remove the file and use regular rebalance to restore
the partition (this will also lead to shrink).

BTW, seems, we able to implement partition shrink in a cheap way.
We may just use rebalancing code to apply fat partition's entries to the
new file.
So, 3 stages here: local rebalance, indices update and global historical
rebalance.

On Thu, Sep 19, 2019 at 11:43 AM Alexey Goncharuk <
alexey.goncharuk@gmail.com> wrote:

> Anton,
>
>
> > >>  The solution which Anton suggested does not look easy because it will
> > most likely significantly hurt performance
> > Mostly agree here, but what drop do we expect? What price do we ready to
> > pay?
> > Not sure, but seems some vendors ready to pay, for example, 5% drop for
> > this.
>
> 5% may be a big drop for some use-cases, so I think we should look at how
> to improve performance, not how to make it worse.
>
>
> >
> > >> it is hard to maintain a data structure to choose "page from free-list
> > with enough space closest to the beginning of the file".
> > We can just split each free-list bucket to the couple and use first for
> > pages in the first half of the file and the second for the last.
> > Only two buckets required here since, during the file shrink, first
> > bucket's window will be shrank too.
> > Seems, this give us the same price on put, just use the first bucket in
> > case it's not empty.
> > Remove price (with merge) will be increased, of course.
> >
> > The compromise solution is to have priority put (to the first path of the
> > file), with keeping removal as is, and schedulable per-page migration for
> > the rest of the data during the low activity period.
> >
> Free lists are large and slow by themselves, it is expensive to checkpoint
> and read them on start, so as a long-term solution I would look into
> removing them. Moreover, not sure if adding yet another background process
> will improve the codebase reliability and simplicity.
>
> If we want to go the hard path, I would look at free page tracking bitmap -
> a special bitmask page, where each page in an adjacent block is marked as 0
> if it has free space more than a certain configurable threshold (say, 80%)
> - free, and 1 if less (full). Some vendors have successfully implemented
> this approach, which looks much more promising, but harder to implement.
>
> --AG
>

Re: How to free up space on disc after removing entries from IgniteCache with enabled PDS?

Posted by Alexey Goncharuk <al...@gmail.com>.
Anton,


> >>  The solution which Anton suggested does not look easy because it will
> most likely significantly hurt performance
> Mostly agree here, but what drop do we expect? What price do we ready to
> pay?
> Not sure, but seems some vendors ready to pay, for example, 5% drop for
> this.

5% may be a big drop for some use-cases, so I think we should look at how
to improve performance, not how to make it worse.


>
> >> it is hard to maintain a data structure to choose "page from free-list
> with enough space closest to the beginning of the file".
> We can just split each free-list bucket to the couple and use first for
> pages in the first half of the file and the second for the last.
> Only two buckets required here since, during the file shrink, first
> bucket's window will be shrank too.
> Seems, this give us the same price on put, just use the first bucket in
> case it's not empty.
> Remove price (with merge) will be increased, of course.
>
> The compromise solution is to have priority put (to the first path of the
> file), with keeping removal as is, and schedulable per-page migration for
> the rest of the data during the low activity period.
>
Free lists are large and slow by themselves, it is expensive to checkpoint
and read them on start, so as a long-term solution I would look into
removing them. Moreover, not sure if adding yet another background process
will improve the codebase reliability and simplicity.

If we want to go the hard path, I would look at free page tracking bitmap -
a special bitmask page, where each page in an adjacent block is marked as 0
if it has free space more than a certain configurable threshold (say, 80%)
- free, and 1 if less (full). Some vendors have successfully implemented
this approach, which looks much more promising, but harder to implement.

--AG

Re: How to free up space on disc after removing entries from IgniteCache with enabled PDS?

Posted by Anton Vinogradov <av...@apache.org>.
Alexey,

>>  The solution which Anton suggested does not look easy because it will
most likely significantly hurt performance
Mostly agree here, but what drop do we expect? What price do we ready to
pay?
Not sure, but seems some vendors ready to pay, for example, 5% drop for
this.

>> it is hard to maintain a data structure to choose "page from free-list
with enough space closest to the beginning of the file".
We can just split each free-list bucket to the couple and use first for
pages in the first half of the file and the second for the last.
Only two buckets required here since, during the file shrink, first
bucket's window will be shrank too.
Seems, this give us the same price on put, just use the first bucket in
case it's not empty.
Remove price (with merge) will be increased, of course.

The compromise solution is to have priority put (to the first path of the
file), with keeping removal as is, and schedulable per-page migration for
the rest of the data during the low activity period.

On Wed, Sep 18, 2019 at 8:03 PM Alexey Goncharuk <al...@gmail.com>
wrote:

> Denis,
>
> It's not fundamental, but quite complex. In postgres, for example, this is
> not maintained automatically and store compaction is performed using the
> full vacuum command, which acquires exclusive table lock, so no concurrent
> activities on the table are possible.
>
> The solution which Anton suggested does not look easy because it will most
> likely significantly hurt performance: it is hard to maintain a data
> structure to choose "page from free-list with enough space closest to the
> beginning of the file". Overall, we can think of something similar to
> postgres, when a space can be freed in some maintenance mode.
>
> Online space cleanup sounds tricky for me, or at least I cannot think about
> a plausible solution right away.
>
> --AG
>
> пт, 13 сент. 2019 г. в 19:43, Denis Magda <dm...@apache.org>:
>
> > The issue starts hitting others who deploy Ignite persistence in
> > production:
> > https://issues.apache.org/jira/browse/IGNITE-12152
> >
> > Alex, I'm curious is this a fundamental problem. Asked the same question
> in
> > JIRA but, probably, this discussion is a better place to get to the
> bottom
> > first:
> > https://issues.apache.org/jira/browse/IGNITE-10862
> >
> > -
> > Denis
> >
> >
> > On Thu, Jan 10, 2019 at 6:01 AM Anton Vinogradov <av...@apache.org> wrote:
> >
> > > Dmitriy,
> > >
> > > This does not look like a production-ready case :)
> > >
> > > How about
> > > 1) Once you need to write an entry - you have to chose not random "page
> > > from free-list with enough space"
> > > but "page from free-list with enough space closest to the beginning of
> > the
> > > file".
> > >
> > > 2) Once you remove entry you have to merge the rest of the entries at
> > this
> > > page to the
> > > "page from free-list with enough space closest to the beginning of the
> > > file"
> > > if possible. (optional)
> > >
> > > 3) Partition file tail with empty pages can bу removed at any time.
> > >
> > > 4) In case you have cold data inside the tail, just lock the page and
> > > perform migration to
> > > "page from free-list with enough space closest to the beginning of the
> > > file".
> > > This operation can be scheduled.
> > >
> > > On Wed, Jan 9, 2019 at 4:43 PM Dmitriy Pavlov <dp...@apache.org>
> > wrote:
> > >
> > > > In the TC Bot, I used to create the second cache with CacheV2 name
> and
> > > > migrate needed data from Cache  V1 to V2.
> > > >
> > > > After CacheV1 destroy(), files are removed and disk space is freed.
> > > >
> > > > ср, 9 янв. 2019 г. в 12:04, Павлухин Иван <vo...@gmail.com>:
> > > >
> > > > > Vyacheslav,
> > > > >
> > > > > Have you investigated how other vendors (Oracle, Postgres) tackle
> > this
> > > > > problem?
> > > > >
> > > > > I have one wild idea. Could the problem be solved by stopping a
> node
> > > > > which need to be defragmented, clearing persistence files and
> > > > > restarting the node? After rebalance the node will receive all data
> > > > > back without fragmentation. I see a big downside -- sending data
> > > > > across the network. But perhaps we can play with affinity and start
> > > > > new node on the same host which will receive the same data, after
> > that
> > > > > old node can be stopped. It looks more as kind of workaround but
> > > > > perhaps it can be turned into workable solution.
> > > > >
> > > > > ср, 9 янв. 2019 г. в 10:49, Vyacheslav Daradur <
> daradurvs@gmail.com
> > >:
> > > > > >
> > > > > > Yes, it's about Page Memory defragmentation.
> > > > > >
> > > > > > Pages in partitions files are stored sequentially, possible, it
> > makes
> > > > > > sense to defragment pages first to avoid interpages gaps since we
> > use
> > > > > > pages offset to manage them.
> > > > > >
> > > > > > I filled an issue [1], I hope we will be able to find resources
> to
> > > > > > solve the issue before 2.8 release.
> > > > > >
> > > > > > [1] https://issues.apache.org/jira/browse/IGNITE-10862
> > > > > >
> > > > > > On Sat, Dec 29, 2018 at 10:47 AM Павлухин Иван <
> > vololo100@gmail.com>
> > > > > wrote:
> > > > > > >
> > > > > > > I suppose it is about Ignite Page Memory pages defragmentation.
> > > > > > >
> > > > > > > We can get 100 allocated pages each of which becomes only e.g.
> > 50%
> > > > > > > filled after removal some entries. But they will occupy a space
> > for
> > > > > > > 100 pages on a hard drive.
> > > > > > >
> > > > > > > пт, 28 дек. 2018 г. в 20:45, Denis Magda <dm...@apache.org>:
> > > > > > > >
> > > > > > > > Shouldn't the OS care of defragmentation? What we need to do
> is
> > > to
> > > > > give a
> > > > > > > > way to remove stale data and "release" the allocated space
> > > somehow
> > > > > through
> > > > > > > > the tools, MBeans or API methods.
> > > > > > > >
> > > > > > > > --
> > > > > > > > Denis
> > > > > > > >
> > > > > > > >
> > > > > > > > On Fri, Dec 28, 2018 at 6:24 AM Vladimir Ozerov <
> > > > > vozerov@gridgain.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Vyacheslav,
> > > > > > > > >
> > > > > > > > > AFAIK this is not implemented. Shrinking/defragmentation is
> > > > > important
> > > > > > > > > optimization. Not only because it releases free space, but
> > also
> > > > > because it
> > > > > > > > > decreases total number of pages. But is it not very easy to
> > > > > implement, as
> > > > > > > > > you have to both reshuffle data entries and index entries,
> > > > > maintaining
> > > > > > > > > consistency for concurrent reads and updates at the same
> > time.
> > > Or
> > > > > > > > > alternatively we can think of offline defragmentation. It
> > will
> > > be
> > > > > easier to
> > > > > > > > > implement and faster, but concurrent operations will be
> > > > prohibited.
> > > > > > > > >
> > > > > > > > > On Fri, Dec 28, 2018 at 4:08 PM Vyacheslav Daradur <
> > > > > daradurvs@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Igniters, we have faced with the following problem on one
> > of
> > > > our
> > > > > > > > > > deployments.
> > > > > > > > > >
> > > > > > > > > > Let's imagine that we have used IgniteCache with enabled
> > PDS
> > > > > during the
> > > > > > > > > > time:
> > > > > > > > > > - hardware disc space has been occupied during growing up
> > of
> > > an
> > > > > amount
> > > > > > > > > > of data, e.g. 100Gb;
> > > > > > > > > > - then, we removed non-actual data, e.g 50Gb, which
> became
> > > > > useless for
> > > > > > > > > us;
> > > > > > > > > > - disc space stopped growing up with new data, but it was
> > not
> > > > > > > > > > released, and still took 100Gb, instead of expected 50Gb;
> > > > > > > > > >
> > > > > > > > > > Another use case:
> > > > > > > > > > - a user extracts data from IgniteCache to store it in
> > > separate
> > > > > > > > > > IgniteCache or another store;
> > > > > > > > > > - disc still is occupied and the user is not able to
> store
> > > data
> > > > > in the
> > > > > > > > > > different cache at the same cluster because of disc
> > > limitation;
> > > > > > > > > >
> > > > > > > > > > How can we help the user to free up the disc space, if an
> > > > amount
> > > > > of
> > > > > > > > > > data in IgniteCache has been reduced many times and will
> > not
> > > be
> > > > > > > > > > increased in the nearest future?
> > > > > > > > > >
> > > > > > > > > > AFAIK, we have mechanics of reusing memory pages, that
> > allows
> > > > us
> > > > > to
> > > > > > > > > > use pages which have been allocated and stored removed
> data
> > > for
> > > > > > > > > > storing new data.
> > > > > > > > > > Are there any chances to shrink data and free up space on
> > > disc
> > > > > (with
> > > > > > > > > > defragmentation if possible)?
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > Best Regards, Vyacheslav D.
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Best regards,
> > > > > > > Ivan Pavlukhin
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Best Regards, Vyacheslav D.
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best regards,
> > > > > Ivan Pavlukhin
> > > > >
> > > >
> > >
> >
>

Re: How to free up space on disc after removing entries from IgniteCache with enabled PDS?

Posted by Alexey Goncharuk <al...@gmail.com>.
Denis,

It's not fundamental, but quite complex. In postgres, for example, this is
not maintained automatically and store compaction is performed using the
full vacuum command, which acquires exclusive table lock, so no concurrent
activities on the table are possible.

The solution which Anton suggested does not look easy because it will most
likely significantly hurt performance: it is hard to maintain a data
structure to choose "page from free-list with enough space closest to the
beginning of the file". Overall, we can think of something similar to
postgres, when a space can be freed in some maintenance mode.

Online space cleanup sounds tricky for me, or at least I cannot think about
a plausible solution right away.

--AG

пт, 13 сент. 2019 г. в 19:43, Denis Magda <dm...@apache.org>:

> The issue starts hitting others who deploy Ignite persistence in
> production:
> https://issues.apache.org/jira/browse/IGNITE-12152
>
> Alex, I'm curious is this a fundamental problem. Asked the same question in
> JIRA but, probably, this discussion is a better place to get to the bottom
> first:
> https://issues.apache.org/jira/browse/IGNITE-10862
>
> -
> Denis
>
>
> On Thu, Jan 10, 2019 at 6:01 AM Anton Vinogradov <av...@apache.org> wrote:
>
> > Dmitriy,
> >
> > This does not look like a production-ready case :)
> >
> > How about
> > 1) Once you need to write an entry - you have to chose not random "page
> > from free-list with enough space"
> > but "page from free-list with enough space closest to the beginning of
> the
> > file".
> >
> > 2) Once you remove entry you have to merge the rest of the entries at
> this
> > page to the
> > "page from free-list with enough space closest to the beginning of the
> > file"
> > if possible. (optional)
> >
> > 3) Partition file tail with empty pages can bу removed at any time.
> >
> > 4) In case you have cold data inside the tail, just lock the page and
> > perform migration to
> > "page from free-list with enough space closest to the beginning of the
> > file".
> > This operation can be scheduled.
> >
> > On Wed, Jan 9, 2019 at 4:43 PM Dmitriy Pavlov <dp...@apache.org>
> wrote:
> >
> > > In the TC Bot, I used to create the second cache with CacheV2 name and
> > > migrate needed data from Cache  V1 to V2.
> > >
> > > After CacheV1 destroy(), files are removed and disk space is freed.
> > >
> > > ср, 9 янв. 2019 г. в 12:04, Павлухин Иван <vo...@gmail.com>:
> > >
> > > > Vyacheslav,
> > > >
> > > > Have you investigated how other vendors (Oracle, Postgres) tackle
> this
> > > > problem?
> > > >
> > > > I have one wild idea. Could the problem be solved by stopping a node
> > > > which need to be defragmented, clearing persistence files and
> > > > restarting the node? After rebalance the node will receive all data
> > > > back without fragmentation. I see a big downside -- sending data
> > > > across the network. But perhaps we can play with affinity and start
> > > > new node on the same host which will receive the same data, after
> that
> > > > old node can be stopped. It looks more as kind of workaround but
> > > > perhaps it can be turned into workable solution.
> > > >
> > > > ср, 9 янв. 2019 г. в 10:49, Vyacheslav Daradur <daradurvs@gmail.com
> >:
> > > > >
> > > > > Yes, it's about Page Memory defragmentation.
> > > > >
> > > > > Pages in partitions files are stored sequentially, possible, it
> makes
> > > > > sense to defragment pages first to avoid interpages gaps since we
> use
> > > > > pages offset to manage them.
> > > > >
> > > > > I filled an issue [1], I hope we will be able to find resources to
> > > > > solve the issue before 2.8 release.
> > > > >
> > > > > [1] https://issues.apache.org/jira/browse/IGNITE-10862
> > > > >
> > > > > On Sat, Dec 29, 2018 at 10:47 AM Павлухин Иван <
> vololo100@gmail.com>
> > > > wrote:
> > > > > >
> > > > > > I suppose it is about Ignite Page Memory pages defragmentation.
> > > > > >
> > > > > > We can get 100 allocated pages each of which becomes only e.g.
> 50%
> > > > > > filled after removal some entries. But they will occupy a space
> for
> > > > > > 100 pages on a hard drive.
> > > > > >
> > > > > > пт, 28 дек. 2018 г. в 20:45, Denis Magda <dm...@apache.org>:
> > > > > > >
> > > > > > > Shouldn't the OS care of defragmentation? What we need to do is
> > to
> > > > give a
> > > > > > > way to remove stale data and "release" the allocated space
> > somehow
> > > > through
> > > > > > > the tools, MBeans or API methods.
> > > > > > >
> > > > > > > --
> > > > > > > Denis
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Dec 28, 2018 at 6:24 AM Vladimir Ozerov <
> > > > vozerov@gridgain.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Vyacheslav,
> > > > > > > >
> > > > > > > > AFAIK this is not implemented. Shrinking/defragmentation is
> > > > important
> > > > > > > > optimization. Not only because it releases free space, but
> also
> > > > because it
> > > > > > > > decreases total number of pages. But is it not very easy to
> > > > implement, as
> > > > > > > > you have to both reshuffle data entries and index entries,
> > > > maintaining
> > > > > > > > consistency for concurrent reads and updates at the same
> time.
> > Or
> > > > > > > > alternatively we can think of offline defragmentation. It
> will
> > be
> > > > easier to
> > > > > > > > implement and faster, but concurrent operations will be
> > > prohibited.
> > > > > > > >
> > > > > > > > On Fri, Dec 28, 2018 at 4:08 PM Vyacheslav Daradur <
> > > > daradurvs@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Igniters, we have faced with the following problem on one
> of
> > > our
> > > > > > > > > deployments.
> > > > > > > > >
> > > > > > > > > Let's imagine that we have used IgniteCache with enabled
> PDS
> > > > during the
> > > > > > > > > time:
> > > > > > > > > - hardware disc space has been occupied during growing up
> of
> > an
> > > > amount
> > > > > > > > > of data, e.g. 100Gb;
> > > > > > > > > - then, we removed non-actual data, e.g 50Gb, which became
> > > > useless for
> > > > > > > > us;
> > > > > > > > > - disc space stopped growing up with new data, but it was
> not
> > > > > > > > > released, and still took 100Gb, instead of expected 50Gb;
> > > > > > > > >
> > > > > > > > > Another use case:
> > > > > > > > > - a user extracts data from IgniteCache to store it in
> > separate
> > > > > > > > > IgniteCache or another store;
> > > > > > > > > - disc still is occupied and the user is not able to store
> > data
> > > > in the
> > > > > > > > > different cache at the same cluster because of disc
> > limitation;
> > > > > > > > >
> > > > > > > > > How can we help the user to free up the disc space, if an
> > > amount
> > > > of
> > > > > > > > > data in IgniteCache has been reduced many times and will
> not
> > be
> > > > > > > > > increased in the nearest future?
> > > > > > > > >
> > > > > > > > > AFAIK, we have mechanics of reusing memory pages, that
> allows
> > > us
> > > > to
> > > > > > > > > use pages which have been allocated and stored removed data
> > for
> > > > > > > > > storing new data.
> > > > > > > > > Are there any chances to shrink data and free up space on
> > disc
> > > > (with
> > > > > > > > > defragmentation if possible)?
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Best Regards, Vyacheslav D.
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Best regards,
> > > > > > Ivan Pavlukhin
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best Regards, Vyacheslav D.
> > > >
> > > >
> > > >
> > > > --
> > > > Best regards,
> > > > Ivan Pavlukhin
> > > >
> > >
> >
>