You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@ignite.apache.org by Vyacheslav Daradur <da...@gmail.com> on 2018/03/05 17:18:16 UTC

Data compression design proposal

Hi Igniters!

I’d like to do next step in our data compression discussion [1].

Most Igniters vote for per-data-page compression.

I’d like to accumulate  main theses to start implementation:
- page will be compressed with the dictionary-based approach (e.g.LZV)
- page will be compressed in batch mode (not on every change)
- page compression should been initiated by an event, for example, a
page’s free space drops below 20%
- compression process will be under page write lock

Vladimir Ozerov has written:
>> What we do not understand yet:
>> 1) Granularity of compression algorithm.
>> 1.1) It could be per-entry - i.e. we compress the whole entry content, but
>> respect boundaries between entries. E.g.: before - [ENTRY_1][ENTRY_2],
>> after - [COMPRESSED_ENTRY_1][COMPRESSED_ENTRY_2] (as opposed to [COMPRESSED ENTRY_1 and ENTRY_2]).
>> v1.2) Or it could be per-field - i.e. we compress fields, but respect binary
>> object layout. First approach is simple, straightforward, and will give
>> acceptable compression rate, but we will have to compress the whole binary
>> object on every field access, what may ruin our SQL performance. Second
>> approach is more complex, we are not sure about it's compression rate, but
>> as BinaryObject structure is preserved, we will still have fast
>> constant-time per-field access.

I think there are advantages in both approaches and we will be able to
compare different approaches and algorithms after prototype
implementation.

Main approach in brief:
1) When page’s free space drops below 20% will be triggered compression event
2) Page will be locked by write lock
3) Page will be passed to page’s compressor implementation
4) Page will be replaced by compressed page

Whole object or a field reading:
1) If page marked as compressed then the page will be handled by
page’s compressor implementation, otherwise, it will be handled as
usual.

Thoughts?

Should we create new IEP and register tickets to start implementation?
This will allow us to watch for the feature progress and related
tasks.


[1] http://apache-ignite-developers.2346864.n4.nabble.com/Data-compression-in-Ignite-tc20679.html


-- 
Best Regards, Vyacheslav D.

Re: Data compression design proposal

Posted by Dmitry Pavlov <dp...@gmail.com>.

Hi Anton,

Do you have suggestions for this approach?

Sincerely,
Dmitriy Pavlov

пн, 26 мар. 2018 г. в 19:46, Anton Vinogradov <av...@apache.org>:

> Can we use another approach to store compressed pages?
>
> 2018-03-26 19:06 GMT+03:00 Dmitry Pavlov <dp...@gmail.com>:
>
> > +1 to Alexey's concern. No reason to compress if we use previous offset
> as
> > pageIdx*pageSize.
> >
> > пн, 26 мар. 2018 г. в 18:59, Alexey Goncharuk <
> alexey.goncharuk@gmail.com
> > >:
> >
> > > Guys,
> > >
> > > How does this fit the PageMemory concept? Currently it assumes that the
> > > size of the page in memory and the size of the page on disk is the
> same,
> > so
> > > only per-entry level compression within a page makes sense.
> > >
> > > If you compress a whole page, how do you calculate the page offset in
> the
> > > target data file?
> > >
> > > --AG
> > >
> > > 2018-03-26 17:39 GMT+03:00 Vladimir Ozerov <vo...@gridgain.com>:
> > >
> > > > Gents,
> > > >
> > > > If I understood the idea correctly, the proposal is to compress pages
> > on
> > > > eviction and decompress them on read from disk. Is it correct?
> > > >
> > > > On Mon, Mar 26, 2018 at 5:13 PM, Anton Vinogradov <av...@apache.org>
> > wrote:
> > > >
> > > > > + 1 to Taras's vision.
> > > > >
> > > > > Compression on eviction is a good case to store more.
> > > > > Pages at memory always hot a real system, so complession in memory
> > will
> > > > > definetely slowdown the system, I think.
> > > > >
> > > > > Anyway, we can split issue to "on eviction compression" and to
> > > "in-memory
> > > > > compression".
> > > > >
> > > > >
> > > > > 2018-03-06 12:14 GMT+03:00 Taras Ledkov <tl...@gridgain.com>:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I guess page level compression make sense on page loading /
> > eviction.
> > > > > > In this case we can decrease I/O operation and performance boost
> > can
> > > be
> > > > > > reached.
> > > > > > What is goal for in-memory compression? Holds about 2-5x data in
> > > memory
> > > > > > with performance drop?
> > > > > >
> > > > > > Also please clarify the case with compression/decompression for
> hot
> > > and
> > > > > > cold pages.
> > > > > > Is it right for your approach:
> > > > > > 1. Hot pages are always decompressed in memory because many
> > > read/write
> > > > > > operations touch ones.
> > > > > > 2. So we can compress only cold pages.
> > > > > >
> > > > > > So the way is suitable when the hot data size << available RAM
> > size.
> > > > > >
> > > > > > Thoughts?
> > > > > >
> > > > > >
> > > > > > On 05.03.2018 20:18, Vyacheslav Daradur wrote:
> > > > > >
> > > > > >> Hi Igniters!
> > > > > >>
> > > > > >> I’d like to do next step in our data compression discussion [1].
> > > > > >>
> > > > > >> Most Igniters vote for per-data-page compression.
> > > > > >>
> > > > > >> I’d like to accumulate  main theses to start implementation:
> > > > > >> - page will be compressed with the dictionary-based approach
> > > (e.g.LZV)
> > > > > >> - page will be compressed in batch mode (not on every change)
> > > > > >> - page compression should been initiated by an event, for
> > example, a
> > > > > >> page’s free space drops below 20%
> > > > > >> - compression process will be under page write lock
> > > > > >>
> > > > > >> Vladimir Ozerov has written:
> > > > > >>
> > > > > >>> What we do not understand yet:
> > > > > >>>> 1) Granularity of compression algorithm.
> > > > > >>>> 1.1) It could be per-entry - i.e. we compress the whole entry
> > > > content,
> > > > > >>>> but
> > > > > >>>> respect boundaries between entries. E.g.: before -
> > > > [ENTRY_1][ENTRY_2],
> > > > > >>>> after - [COMPRESSED_ENTRY_1][COMPRESSED_ENTRY_2] (as opposed
> to
> > > > > >>>> [COMPRESSED ENTRY_1 and ENTRY_2]).
> > > > > >>>> v1.2) Or it could be per-field - i.e. we compress fields, but
> > > > respect
> > > > > >>>> binary
> > > > > >>>> object layout. First approach is simple, straightforward, and
> > will
> > > > > give
> > > > > >>>> acceptable compression rate, but we will have to compress the
> > > whole
> > > > > >>>> binary
> > > > > >>>> object on every field access, what may ruin our SQL
> performance.
> > > > > Second
> > > > > >>>> approach is more complex, we are not sure about it's
> compression
> > > > rate,
> > > > > >>>> but
> > > > > >>>> as BinaryObject structure is preserved, we will still have
> fast
> > > > > >>>> constant-time per-field access.
> > > > > >>>>
> > > > > >>> I think there are advantages in both approaches and we will be
> > able
> > > > to
> > > > > >> compare different approaches and algorithms after prototype
> > > > > >> implementation.
> > > > > >>
> > > > > >> Main approach in brief:
> > > > > >> 1) When page’s free space drops below 20% will be triggered
> > > > compression
> > > > > >> event
> > > > > >> 2) Page will be locked by write lock
> > > > > >> 3) Page will be passed to page’s compressor implementation
> > > > > >> 4) Page will be replaced by compressed page
> > > > > >>
> > > > > >> Whole object or a field reading:
> > > > > >> 1) If page marked as compressed then the page will be handled by
> > > > > >> page’s compressor implementation, otherwise, it will be handled
> as
> > > > > >> usual.
> > > > > >>
> > > > > >> Thoughts?
> > > > > >>
> > > > > >> Should we create new IEP and register tickets to start
> > > implementation?
> > > > > >> This will allow us to watch for the feature progress and related
> > > > > >> tasks.
> > > > > >>
> > > > > >>
> > > > > >> [1] http://apache-ignite-developers.2346864.n4.nabble.com/Data-
> > > > > >> compression-in-Ignite-tc20679.html
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > > --
> > > > > > Taras Ledkov
> > > > > > Mail-To: tledkov@gridgain.com
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Data compression design proposal

Posted by Anton Vinogradov <av...@apache.org>.

Can we use another approach to store compressed pages?

2018-03-26 19:06 GMT+03:00 Dmitry Pavlov <dp...@gmail.com>:

> +1 to Alexey's concern. No reason to compress if we use previous offset as
> pageIdx*pageSize.
>
> пн, 26 мар. 2018 г. в 18:59, Alexey Goncharuk <alexey.goncharuk@gmail.com
> >:
>
> > Guys,
> >
> > How does this fit the PageMemory concept? Currently it assumes that the
> > size of the page in memory and the size of the page on disk is the same,
> so
> > only per-entry level compression within a page makes sense.
> >
> > If you compress a whole page, how do you calculate the page offset in the
> > target data file?
> >
> > --AG
> >
> > 2018-03-26 17:39 GMT+03:00 Vladimir Ozerov <vo...@gridgain.com>:
> >
> > > Gents,
> > >
> > > If I understood the idea correctly, the proposal is to compress pages
> on
> > > eviction and decompress them on read from disk. Is it correct?
> > >
> > > On Mon, Mar 26, 2018 at 5:13 PM, Anton Vinogradov <av...@apache.org>
> wrote:
> > >
> > > > + 1 to Taras's vision.
> > > >
> > > > Compression on eviction is a good case to store more.
> > > > Pages at memory always hot a real system, so complession in memory
> will
> > > > definetely slowdown the system, I think.
> > > >
> > > > Anyway, we can split issue to "on eviction compression" and to
> > "in-memory
> > > > compression".
> > > >
> > > >
> > > > 2018-03-06 12:14 GMT+03:00 Taras Ledkov <tl...@gridgain.com>:
> > > >
> > > > > Hi,
> > > > >
> > > > > I guess page level compression make sense on page loading /
> eviction.
> > > > > In this case we can decrease I/O operation and performance boost
> can
> > be
> > > > > reached.
> > > > > What is goal for in-memory compression? Holds about 2-5x data in
> > memory
> > > > > with performance drop?
> > > > >
> > > > > Also please clarify the case with compression/decompression for hot
> > and
> > > > > cold pages.
> > > > > Is it right for your approach:
> > > > > 1. Hot pages are always decompressed in memory because many
> > read/write
> > > > > operations touch ones.
> > > > > 2. So we can compress only cold pages.
> > > > >
> > > > > So the way is suitable when the hot data size << available RAM
> size.
> > > > >
> > > > > Thoughts?
> > > > >
> > > > >
> > > > > On 05.03.2018 20:18, Vyacheslav Daradur wrote:
> > > > >
> > > > >> Hi Igniters!
> > > > >>
> > > > >> I’d like to do next step in our data compression discussion [1].
> > > > >>
> > > > >> Most Igniters vote for per-data-page compression.
> > > > >>
> > > > >> I’d like to accumulate  main theses to start implementation:
> > > > >> - page will be compressed with the dictionary-based approach
> > (e.g.LZV)
> > > > >> - page will be compressed in batch mode (not on every change)
> > > > >> - page compression should been initiated by an event, for
> example, a
> > > > >> page’s free space drops below 20%
> > > > >> - compression process will be under page write lock
> > > > >>
> > > > >> Vladimir Ozerov has written:
> > > > >>
> > > > >>> What we do not understand yet:
> > > > >>>> 1) Granularity of compression algorithm.
> > > > >>>> 1.1) It could be per-entry - i.e. we compress the whole entry
> > > content,
> > > > >>>> but
> > > > >>>> respect boundaries between entries. E.g.: before -
> > > [ENTRY_1][ENTRY_2],
> > > > >>>> after - [COMPRESSED_ENTRY_1][COMPRESSED_ENTRY_2] (as opposed to
> > > > >>>> [COMPRESSED ENTRY_1 and ENTRY_2]).
> > > > >>>> v1.2) Or it could be per-field - i.e. we compress fields, but
> > > respect
> > > > >>>> binary
> > > > >>>> object layout. First approach is simple, straightforward, and
> will
> > > > give
> > > > >>>> acceptable compression rate, but we will have to compress the
> > whole
> > > > >>>> binary
> > > > >>>> object on every field access, what may ruin our SQL performance.
> > > > Second
> > > > >>>> approach is more complex, we are not sure about it's compression
> > > rate,
> > > > >>>> but
> > > > >>>> as BinaryObject structure is preserved, we will still have fast
> > > > >>>> constant-time per-field access.
> > > > >>>>
> > > > >>> I think there are advantages in both approaches and we will be
> able
> > > to
> > > > >> compare different approaches and algorithms after prototype
> > > > >> implementation.
> > > > >>
> > > > >> Main approach in brief:
> > > > >> 1) When page’s free space drops below 20% will be triggered
> > > compression
> > > > >> event
> > > > >> 2) Page will be locked by write lock
> > > > >> 3) Page will be passed to page’s compressor implementation
> > > > >> 4) Page will be replaced by compressed page
> > > > >>
> > > > >> Whole object or a field reading:
> > > > >> 1) If page marked as compressed then the page will be handled by
> > > > >> page’s compressor implementation, otherwise, it will be handled as
> > > > >> usual.
> > > > >>
> > > > >> Thoughts?
> > > > >>
> > > > >> Should we create new IEP and register tickets to start
> > implementation?
> > > > >> This will allow us to watch for the feature progress and related
> > > > >> tasks.
> > > > >>
> > > > >>
> > > > >> [1] http://apache-ignite-developers.2346864.n4.nabble.com/Data-
> > > > >> compression-in-Ignite-tc20679.html
> > > > >>
> > > > >>
> > > > >>
> > > > > --
> > > > > Taras Ledkov
> > > > > Mail-To: tledkov@gridgain.com
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: Data compression design proposal

Posted by Dmitry Pavlov <dp...@gmail.com>.

+1 to Alexey's concern. No reason to compress if we use previous offset as
pageIdx*pageSize.

пн, 26 мар. 2018 г. в 18:59, Alexey Goncharuk <al...@gmail.com>:

> Guys,
>
> How does this fit the PageMemory concept? Currently it assumes that the
> size of the page in memory and the size of the page on disk is the same, so
> only per-entry level compression within a page makes sense.
>
> If you compress a whole page, how do you calculate the page offset in the
> target data file?
>
> --AG
>
> 2018-03-26 17:39 GMT+03:00 Vladimir Ozerov <vo...@gridgain.com>:
>
> > Gents,
> >
> > If I understood the idea correctly, the proposal is to compress pages on
> > eviction and decompress them on read from disk. Is it correct?
> >
> > On Mon, Mar 26, 2018 at 5:13 PM, Anton Vinogradov <av...@apache.org> wrote:
> >
> > > + 1 to Taras's vision.
> > >
> > > Compression on eviction is a good case to store more.
> > > Pages at memory always hot a real system, so complession in memory will
> > > definetely slowdown the system, I think.
> > >
> > > Anyway, we can split issue to "on eviction compression" and to
> "in-memory
> > > compression".
> > >
> > >
> > > 2018-03-06 12:14 GMT+03:00 Taras Ledkov <tl...@gridgain.com>:
> > >
> > > > Hi,
> > > >
> > > > I guess page level compression make sense on page loading / eviction.
> > > > In this case we can decrease I/O operation and performance boost can
> be
> > > > reached.
> > > > What is goal for in-memory compression? Holds about 2-5x data in
> memory
> > > > with performance drop?
> > > >
> > > > Also please clarify the case with compression/decompression for hot
> and
> > > > cold pages.
> > > > Is it right for your approach:
> > > > 1. Hot pages are always decompressed in memory because many
> read/write
> > > > operations touch ones.
> > > > 2. So we can compress only cold pages.
> > > >
> > > > So the way is suitable when the hot data size << available RAM size.
> > > >
> > > > Thoughts?
> > > >
> > > >
> > > > On 05.03.2018 20:18, Vyacheslav Daradur wrote:
> > > >
> > > >> Hi Igniters!
> > > >>
> > > >> I’d like to do next step in our data compression discussion [1].
> > > >>
> > > >> Most Igniters vote for per-data-page compression.
> > > >>
> > > >> I’d like to accumulate  main theses to start implementation:
> > > >> - page will be compressed with the dictionary-based approach
> (e.g.LZV)
> > > >> - page will be compressed in batch mode (not on every change)
> > > >> - page compression should been initiated by an event, for example, a
> > > >> page’s free space drops below 20%
> > > >> - compression process will be under page write lock
> > > >>
> > > >> Vladimir Ozerov has written:
> > > >>
> > > >>> What we do not understand yet:
> > > >>>> 1) Granularity of compression algorithm.
> > > >>>> 1.1) It could be per-entry - i.e. we compress the whole entry
> > content,
> > > >>>> but
> > > >>>> respect boundaries between entries. E.g.: before -
> > [ENTRY_1][ENTRY_2],
> > > >>>> after - [COMPRESSED_ENTRY_1][COMPRESSED_ENTRY_2] (as opposed to
> > > >>>> [COMPRESSED ENTRY_1 and ENTRY_2]).
> > > >>>> v1.2) Or it could be per-field - i.e. we compress fields, but
> > respect
> > > >>>> binary
> > > >>>> object layout. First approach is simple, straightforward, and will
> > > give
> > > >>>> acceptable compression rate, but we will have to compress the
> whole
> > > >>>> binary
> > > >>>> object on every field access, what may ruin our SQL performance.
> > > Second
> > > >>>> approach is more complex, we are not sure about it's compression
> > rate,
> > > >>>> but
> > > >>>> as BinaryObject structure is preserved, we will still have fast
> > > >>>> constant-time per-field access.
> > > >>>>
> > > >>> I think there are advantages in both approaches and we will be able
> > to
> > > >> compare different approaches and algorithms after prototype
> > > >> implementation.
> > > >>
> > > >> Main approach in brief:
> > > >> 1) When page’s free space drops below 20% will be triggered
> > compression
> > > >> event
> > > >> 2) Page will be locked by write lock
> > > >> 3) Page will be passed to page’s compressor implementation
> > > >> 4) Page will be replaced by compressed page
> > > >>
> > > >> Whole object or a field reading:
> > > >> 1) If page marked as compressed then the page will be handled by
> > > >> page’s compressor implementation, otherwise, it will be handled as
> > > >> usual.
> > > >>
> > > >> Thoughts?
> > > >>
> > > >> Should we create new IEP and register tickets to start
> implementation?
> > > >> This will allow us to watch for the feature progress and related
> > > >> tasks.
> > > >>
> > > >>
> > > >> [1] http://apache-ignite-developers.2346864.n4.nabble.com/Data-
> > > >> compression-in-Ignite-tc20679.html
> > > >>
> > > >>
> > > >>
> > > > --
> > > > Taras Ledkov
> > > > Mail-To: tledkov@gridgain.com
> > > >
> > > >
> > >
> >
>

Re: Data compression design proposal

Posted by Anton Vinogradov <av...@apache.org>.

Vova, thanks for comments.

Anyway, page compression at rebalancing is a good idea even is we have
problems with storing on disc.


2018-03-26 19:51 GMT+03:00 Vyacheslav Daradur <da...@gmail.com>:

> Since PDS is strongly depending on memory page's size I'd like to
> compress serialized data inside page exclude page header.
>
> On Mon, Mar 26, 2018 at 7:49 PM, Vladimir Ozerov <vo...@gridgain.com>
> wrote:
> > Alex,
> >
> > In fact there are many approaches to this. Some vendors decided stick to
> > page - page is filled with data and then compressed when certain
> threshold
> > is reached (e.g. page is full or filled up to X%). Another approach is to
> > store data in memory in *larger blocks* than on the disk, and when it
> comes
> > to flush, one may try to compress it. If final size is lower than disk
> > block size then compression is considered successfull and data is saved
> in
> > compressed form. Otherwise data is saved as is.
> >
> > Both approaches may work, but IMO compression within a single block is
> > better and simpler to implement.
> >
> > On Mon, Mar 26, 2018 at 6:53 PM, Alexey Goncharuk <
> > alexey.goncharuk@gmail.com> wrote:
> >
> >> Guys,
> >>
> >> How does this fit the PageMemory concept? Currently it assumes that the
> >> size of the page in memory and the size of the page on disk is the
> same, so
> >> only per-entry level compression within a page makes sense.
> >>
> >> If you compress a whole page, how do you calculate the page offset in
> the
> >> target data file?
> >>
> >> --AG
> >>
> >> 2018-03-26 17:39 GMT+03:00 Vladimir Ozerov <vo...@gridgain.com>:
> >>
> >> > Gents,
> >> >
> >> > If I understood the idea correctly, the proposal is to compress pages
> on
> >> > eviction and decompress them on read from disk. Is it correct?
> >> >
> >> > On Mon, Mar 26, 2018 at 5:13 PM, Anton Vinogradov <av...@apache.org>
> wrote:
> >> >
> >> > > + 1 to Taras's vision.
> >> > >
> >> > > Compression on eviction is a good case to store more.
> >> > > Pages at memory always hot a real system, so complession in memory
> will
> >> > > definetely slowdown the system, I think.
> >> > >
> >> > > Anyway, we can split issue to "on eviction compression" and to
> >> "in-memory
> >> > > compression".
> >> > >
> >> > >
> >> > > 2018-03-06 12:14 GMT+03:00 Taras Ledkov <tl...@gridgain.com>:
> >> > >
> >> > > > Hi,
> >> > > >
> >> > > > I guess page level compression make sense on page loading /
> eviction.
> >> > > > In this case we can decrease I/O operation and performance boost
> can
> >> be
> >> > > > reached.
> >> > > > What is goal for in-memory compression? Holds about 2-5x data in
> >> memory
> >> > > > with performance drop?
> >> > > >
> >> > > > Also please clarify the case with compression/decompression for
> hot
> >> and
> >> > > > cold pages.
> >> > > > Is it right for your approach:
> >> > > > 1. Hot pages are always decompressed in memory because many
> >> read/write
> >> > > > operations touch ones.
> >> > > > 2. So we can compress only cold pages.
> >> > > >
> >> > > > So the way is suitable when the hot data size << available RAM
> size.
> >> > > >
> >> > > > Thoughts?
> >> > > >
> >> > > >
> >> > > > On 05.03.2018 20:18, Vyacheslav Daradur wrote:
> >> > > >
> >> > > >> Hi Igniters!
> >> > > >>
> >> > > >> I’d like to do next step in our data compression discussion [1].
> >> > > >>
> >> > > >> Most Igniters vote for per-data-page compression.
> >> > > >>
> >> > > >> I’d like to accumulate  main theses to start implementation:
> >> > > >> - page will be compressed with the dictionary-based approach
> >> (e.g.LZV)
> >> > > >> - page will be compressed in batch mode (not on every change)
> >> > > >> - page compression should been initiated by an event, for
> example, a
> >> > > >> page’s free space drops below 20%
> >> > > >> - compression process will be under page write lock
> >> > > >>
> >> > > >> Vladimir Ozerov has written:
> >> > > >>
> >> > > >>> What we do not understand yet:
> >> > > >>>> 1) Granularity of compression algorithm.
> >> > > >>>> 1.1) It could be per-entry - i.e. we compress the whole entry
> >> > content,
> >> > > >>>> but
> >> > > >>>> respect boundaries between entries. E.g.: before -
> >> > [ENTRY_1][ENTRY_2],
> >> > > >>>> after - [COMPRESSED_ENTRY_1][COMPRESSED_ENTRY_2] (as opposed
> to
> >> > > >>>> [COMPRESSED ENTRY_1 and ENTRY_2]).
> >> > > >>>> v1.2) Or it could be per-field - i.e. we compress fields, but
> >> > respect
> >> > > >>>> binary
> >> > > >>>> object layout. First approach is simple, straightforward, and
> will
> >> > > give
> >> > > >>>> acceptable compression rate, but we will have to compress the
> >> whole
> >> > > >>>> binary
> >> > > >>>> object on every field access, what may ruin our SQL
> performance.
> >> > > Second
> >> > > >>>> approach is more complex, we are not sure about it's
> compression
> >> > rate,
> >> > > >>>> but
> >> > > >>>> as BinaryObject structure is preserved, we will still have fast
> >> > > >>>> constant-time per-field access.
> >> > > >>>>
> >> > > >>> I think there are advantages in both approaches and we will be
> able
> >> > to
> >> > > >> compare different approaches and algorithms after prototype
> >> > > >> implementation.
> >> > > >>
> >> > > >> Main approach in brief:
> >> > > >> 1) When page’s free space drops below 20% will be triggered
> >> > compression
> >> > > >> event
> >> > > >> 2) Page will be locked by write lock
> >> > > >> 3) Page will be passed to page’s compressor implementation
> >> > > >> 4) Page will be replaced by compressed page
> >> > > >>
> >> > > >> Whole object or a field reading:
> >> > > >> 1) If page marked as compressed then the page will be handled by
> >> > > >> page’s compressor implementation, otherwise, it will be handled
> as
> >> > > >> usual.
> >> > > >>
> >> > > >> Thoughts?
> >> > > >>
> >> > > >> Should we create new IEP and register tickets to start
> >> implementation?
> >> > > >> This will allow us to watch for the feature progress and related
> >> > > >> tasks.
> >> > > >>
> >> > > >>
> >> > > >> [1] http://apache-ignite-developers.2346864.n4.nabble.com/Data-
> >> > > >> compression-in-Ignite-tc20679.html
> >> > > >>
> >> > > >>
> >> > > >>
> >> > > > --
> >> > > > Taras Ledkov
> >> > > > Mail-To: tledkov@gridgain.com
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
>
>
>
> --
> Best Regards, Vyacheslav D.
>

Re: Data compression design proposal

Posted by Vyacheslav Daradur <da...@gmail.com>.

Since PDS is strongly depending on memory page's size I'd like to
compress serialized data inside page exclude page header.

On Mon, Mar 26, 2018 at 7:49 PM, Vladimir Ozerov <vo...@gridgain.com> wrote:
> Alex,
>
> In fact there are many approaches to this. Some vendors decided stick to
> page - page is filled with data and then compressed when certain threshold
> is reached (e.g. page is full or filled up to X%). Another approach is to
> store data in memory in *larger blocks* than on the disk, and when it comes
> to flush, one may try to compress it. If final size is lower than disk
> block size then compression is considered successfull and data is saved in
> compressed form. Otherwise data is saved as is.
>
> Both approaches may work, but IMO compression within a single block is
> better and simpler to implement.
>
> On Mon, Mar 26, 2018 at 6:53 PM, Alexey Goncharuk <
> alexey.goncharuk@gmail.com> wrote:
>
>> Guys,
>>
>> How does this fit the PageMemory concept? Currently it assumes that the
>> size of the page in memory and the size of the page on disk is the same, so
>> only per-entry level compression within a page makes sense.
>>
>> If you compress a whole page, how do you calculate the page offset in the
>> target data file?
>>
>> --AG
>>
>> 2018-03-26 17:39 GMT+03:00 Vladimir Ozerov <vo...@gridgain.com>:
>>
>> > Gents,
>> >
>> > If I understood the idea correctly, the proposal is to compress pages on
>> > eviction and decompress them on read from disk. Is it correct?
>> >
>> > On Mon, Mar 26, 2018 at 5:13 PM, Anton Vinogradov <av...@apache.org> wrote:
>> >
>> > > + 1 to Taras's vision.
>> > >
>> > > Compression on eviction is a good case to store more.
>> > > Pages at memory always hot a real system, so complession in memory will
>> > > definetely slowdown the system, I think.
>> > >
>> > > Anyway, we can split issue to "on eviction compression" and to
>> "in-memory
>> > > compression".
>> > >
>> > >
>> > > 2018-03-06 12:14 GMT+03:00 Taras Ledkov <tl...@gridgain.com>:
>> > >
>> > > > Hi,
>> > > >
>> > > > I guess page level compression make sense on page loading / eviction.
>> > > > In this case we can decrease I/O operation and performance boost can
>> be
>> > > > reached.
>> > > > What is goal for in-memory compression? Holds about 2-5x data in
>> memory
>> > > > with performance drop?
>> > > >
>> > > > Also please clarify the case with compression/decompression for hot
>> and
>> > > > cold pages.
>> > > > Is it right for your approach:
>> > > > 1. Hot pages are always decompressed in memory because many
>> read/write
>> > > > operations touch ones.
>> > > > 2. So we can compress only cold pages.
>> > > >
>> > > > So the way is suitable when the hot data size << available RAM size.
>> > > >
>> > > > Thoughts?
>> > > >
>> > > >
>> > > > On 05.03.2018 20:18, Vyacheslav Daradur wrote:
>> > > >
>> > > >> Hi Igniters!
>> > > >>
>> > > >> I’d like to do next step in our data compression discussion [1].
>> > > >>
>> > > >> Most Igniters vote for per-data-page compression.
>> > > >>
>> > > >> I’d like to accumulate  main theses to start implementation:
>> > > >> - page will be compressed with the dictionary-based approach
>> (e.g.LZV)
>> > > >> - page will be compressed in batch mode (not on every change)
>> > > >> - page compression should been initiated by an event, for example, a
>> > > >> page’s free space drops below 20%
>> > > >> - compression process will be under page write lock
>> > > >>
>> > > >> Vladimir Ozerov has written:
>> > > >>
>> > > >>> What we do not understand yet:
>> > > >>>> 1) Granularity of compression algorithm.
>> > > >>>> 1.1) It could be per-entry - i.e. we compress the whole entry
>> > content,
>> > > >>>> but
>> > > >>>> respect boundaries between entries. E.g.: before -
>> > [ENTRY_1][ENTRY_2],
>> > > >>>> after - [COMPRESSED_ENTRY_1][COMPRESSED_ENTRY_2] (as opposed to
>> > > >>>> [COMPRESSED ENTRY_1 and ENTRY_2]).
>> > > >>>> v1.2) Or it could be per-field - i.e. we compress fields, but
>> > respect
>> > > >>>> binary
>> > > >>>> object layout. First approach is simple, straightforward, and will
>> > > give
>> > > >>>> acceptable compression rate, but we will have to compress the
>> whole
>> > > >>>> binary
>> > > >>>> object on every field access, what may ruin our SQL performance.
>> > > Second
>> > > >>>> approach is more complex, we are not sure about it's compression
>> > rate,
>> > > >>>> but
>> > > >>>> as BinaryObject structure is preserved, we will still have fast
>> > > >>>> constant-time per-field access.
>> > > >>>>
>> > > >>> I think there are advantages in both approaches and we will be able
>> > to
>> > > >> compare different approaches and algorithms after prototype
>> > > >> implementation.
>> > > >>
>> > > >> Main approach in brief:
>> > > >> 1) When page’s free space drops below 20% will be triggered
>> > compression
>> > > >> event
>> > > >> 2) Page will be locked by write lock
>> > > >> 3) Page will be passed to page’s compressor implementation
>> > > >> 4) Page will be replaced by compressed page
>> > > >>
>> > > >> Whole object or a field reading:
>> > > >> 1) If page marked as compressed then the page will be handled by
>> > > >> page’s compressor implementation, otherwise, it will be handled as
>> > > >> usual.
>> > > >>
>> > > >> Thoughts?
>> > > >>
>> > > >> Should we create new IEP and register tickets to start
>> implementation?
>> > > >> This will allow us to watch for the feature progress and related
>> > > >> tasks.
>> > > >>
>> > > >>
>> > > >> [1] http://apache-ignite-developers.2346864.n4.nabble.com/Data-
>> > > >> compression-in-Ignite-tc20679.html
>> > > >>
>> > > >>
>> > > >>
>> > > > --
>> > > > Taras Ledkov
>> > > > Mail-To: tledkov@gridgain.com
>> > > >
>> > > >
>> > >
>> >
>>



-- 
Best Regards, Vyacheslav D.

Re: Data compression design proposal

Posted by Vladimir Ozerov <vo...@gridgain.com>.

Alex,

In fact there are many approaches to this. Some vendors decided stick to
page - page is filled with data and then compressed when certain threshold
is reached (e.g. page is full or filled up to X%). Another approach is to
store data in memory in *larger blocks* than on the disk, and when it comes
to flush, one may try to compress it. If final size is lower than disk
block size then compression is considered successfull and data is saved in
compressed form. Otherwise data is saved as is.

Both approaches may work, but IMO compression within a single block is
better and simpler to implement.

On Mon, Mar 26, 2018 at 6:53 PM, Alexey Goncharuk <
alexey.goncharuk@gmail.com> wrote:

> Guys,
>
> How does this fit the PageMemory concept? Currently it assumes that the
> size of the page in memory and the size of the page on disk is the same, so
> only per-entry level compression within a page makes sense.
>
> If you compress a whole page, how do you calculate the page offset in the
> target data file?
>
> --AG
>
> 2018-03-26 17:39 GMT+03:00 Vladimir Ozerov <vo...@gridgain.com>:
>
> > Gents,
> >
> > If I understood the idea correctly, the proposal is to compress pages on
> > eviction and decompress them on read from disk. Is it correct?
> >
> > On Mon, Mar 26, 2018 at 5:13 PM, Anton Vinogradov <av...@apache.org> wrote:
> >
> > > + 1 to Taras's vision.
> > >
> > > Compression on eviction is a good case to store more.
> > > Pages at memory always hot a real system, so complession in memory will
> > > definetely slowdown the system, I think.
> > >
> > > Anyway, we can split issue to "on eviction compression" and to
> "in-memory
> > > compression".
> > >
> > >
> > > 2018-03-06 12:14 GMT+03:00 Taras Ledkov <tl...@gridgain.com>:
> > >
> > > > Hi,
> > > >
> > > > I guess page level compression make sense on page loading / eviction.
> > > > In this case we can decrease I/O operation and performance boost can
> be
> > > > reached.
> > > > What is goal for in-memory compression? Holds about 2-5x data in
> memory
> > > > with performance drop?
> > > >
> > > > Also please clarify the case with compression/decompression for hot
> and
> > > > cold pages.
> > > > Is it right for your approach:
> > > > 1. Hot pages are always decompressed in memory because many
> read/write
> > > > operations touch ones.
> > > > 2. So we can compress only cold pages.
> > > >
> > > > So the way is suitable when the hot data size << available RAM size.
> > > >
> > > > Thoughts?
> > > >
> > > >
> > > > On 05.03.2018 20:18, Vyacheslav Daradur wrote:
> > > >
> > > >> Hi Igniters!
> > > >>
> > > >> I’d like to do next step in our data compression discussion [1].
> > > >>
> > > >> Most Igniters vote for per-data-page compression.
> > > >>
> > > >> I’d like to accumulate  main theses to start implementation:
> > > >> - page will be compressed with the dictionary-based approach
> (e.g.LZV)
> > > >> - page will be compressed in batch mode (not on every change)
> > > >> - page compression should been initiated by an event, for example, a
> > > >> page’s free space drops below 20%
> > > >> - compression process will be under page write lock
> > > >>
> > > >> Vladimir Ozerov has written:
> > > >>
> > > >>> What we do not understand yet:
> > > >>>> 1) Granularity of compression algorithm.
> > > >>>> 1.1) It could be per-entry - i.e. we compress the whole entry
> > content,
> > > >>>> but
> > > >>>> respect boundaries between entries. E.g.: before -
> > [ENTRY_1][ENTRY_2],
> > > >>>> after - [COMPRESSED_ENTRY_1][COMPRESSED_ENTRY_2] (as opposed to
> > > >>>> [COMPRESSED ENTRY_1 and ENTRY_2]).
> > > >>>> v1.2) Or it could be per-field - i.e. we compress fields, but
> > respect
> > > >>>> binary
> > > >>>> object layout. First approach is simple, straightforward, and will
> > > give
> > > >>>> acceptable compression rate, but we will have to compress the
> whole
> > > >>>> binary
> > > >>>> object on every field access, what may ruin our SQL performance.
> > > Second
> > > >>>> approach is more complex, we are not sure about it's compression
> > rate,
> > > >>>> but
> > > >>>> as BinaryObject structure is preserved, we will still have fast
> > > >>>> constant-time per-field access.
> > > >>>>
> > > >>> I think there are advantages in both approaches and we will be able
> > to
> > > >> compare different approaches and algorithms after prototype
> > > >> implementation.
> > > >>
> > > >> Main approach in brief:
> > > >> 1) When page’s free space drops below 20% will be triggered
> > compression
> > > >> event
> > > >> 2) Page will be locked by write lock
> > > >> 3) Page will be passed to page’s compressor implementation
> > > >> 4) Page will be replaced by compressed page
> > > >>
> > > >> Whole object or a field reading:
> > > >> 1) If page marked as compressed then the page will be handled by
> > > >> page’s compressor implementation, otherwise, it will be handled as
> > > >> usual.
> > > >>
> > > >> Thoughts?
> > > >>
> > > >> Should we create new IEP and register tickets to start
> implementation?
> > > >> This will allow us to watch for the feature progress and related
> > > >> tasks.
> > > >>
> > > >>
> > > >> [1] http://apache-ignite-developers.2346864.n4.nabble.com/Data-
> > > >> compression-in-Ignite-tc20679.html
> > > >>
> > > >>
> > > >>
> > > > --
> > > > Taras Ledkov
> > > > Mail-To: tledkov@gridgain.com
> > > >
> > > >
> > >
> >
>

Re: Data compression design proposal

Posted by Anton Vinogradov <av...@apache.org>.

>> page compression at rebalancing is a good idea even is we have problems
with storing on disc.
BTW, do we have or going to have rebalancing based on pages streaming
instead of entries streaming?

2018-03-27 3:03 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:

> AG,
>
> I would also ask about the compression itself. How and where do we store
> the compression meta information? We cannot be compressing every page
> separately, it will not be effective. However, if we try to store the
> compression metadata, how do we make other nodes aware of it? Has this been
> discussed?
>
> D.
>
> On Mon, Mar 26, 2018 at 8:53 AM, Alexey Goncharuk <
> alexey.goncharuk@gmail.com> wrote:
>
> > Guys,
> >
> > How does this fit the PageMemory concept? Currently it assumes that the
> > size of the page in memory and the size of the page on disk is the same,
> so
> > only per-entry level compression within a page makes sense.
> >
> > If you compress a whole page, how do you calculate the page offset in the
> > target data file?
> >
> > --AG
> >
> > 2018-03-26 17:39 GMT+03:00 Vladimir Ozerov <vo...@gridgain.com>:
> >
> > > Gents,
> > >
> > > If I understood the idea correctly, the proposal is to compress pages
> on
> > > eviction and decompress them on read from disk. Is it correct?
> > >
> > > On Mon, Mar 26, 2018 at 5:13 PM, Anton Vinogradov <av...@apache.org>
> wrote:
> > >
> > > > + 1 to Taras's vision.
> > > >
> > > > Compression on eviction is a good case to store more.
> > > > Pages at memory always hot a real system, so complession in memory
> will
> > > > definetely slowdown the system, I think.
> > > >
> > > > Anyway, we can split issue to "on eviction compression" and to
> > "in-memory
> > > > compression".
> > > >
> > > >
> > > > 2018-03-06 12:14 GMT+03:00 Taras Ledkov <tl...@gridgain.com>:
> > > >
> > > > > Hi,
> > > > >
> > > > > I guess page level compression make sense on page loading /
> eviction.
> > > > > In this case we can decrease I/O operation and performance boost
> can
> > be
> > > > > reached.
> > > > > What is goal for in-memory compression? Holds about 2-5x data in
> > memory
> > > > > with performance drop?
> > > > >
> > > > > Also please clarify the case with compression/decompression for hot
> > and
> > > > > cold pages.
> > > > > Is it right for your approach:
> > > > > 1. Hot pages are always decompressed in memory because many
> > read/write
> > > > > operations touch ones.
> > > > > 2. So we can compress only cold pages.
> > > > >
> > > > > So the way is suitable when the hot data size << available RAM
> size.
> > > > >
> > > > > Thoughts?
> > > > >
> > > > >
> > > > > On 05.03.2018 20:18, Vyacheslav Daradur wrote:
> > > > >
> > > > >> Hi Igniters!
> > > > >>
> > > > >> I’d like to do next step in our data compression discussion [1].
> > > > >>
> > > > >> Most Igniters vote for per-data-page compression.
> > > > >>
> > > > >> I’d like to accumulate  main theses to start implementation:
> > > > >> - page will be compressed with the dictionary-based approach
> > (e.g.LZV)
> > > > >> - page will be compressed in batch mode (not on every change)
> > > > >> - page compression should been initiated by an event, for
> example, a
> > > > >> page’s free space drops below 20%
> > > > >> - compression process will be under page write lock
> > > > >>
> > > > >> Vladimir Ozerov has written:
> > > > >>
> > > > >>> What we do not understand yet:
> > > > >>>> 1) Granularity of compression algorithm.
> > > > >>>> 1.1) It could be per-entry - i.e. we compress the whole entry
> > > content,
> > > > >>>> but
> > > > >>>> respect boundaries between entries. E.g.: before -
> > > [ENTRY_1][ENTRY_2],
> > > > >>>> after - [COMPRESSED_ENTRY_1][COMPRESSED_ENTRY_2] (as opposed to
> > > > >>>> [COMPRESSED ENTRY_1 and ENTRY_2]).
> > > > >>>> v1.2) Or it could be per-field - i.e. we compress fields, but
> > > respect
> > > > >>>> binary
> > > > >>>> object layout. First approach is simple, straightforward, and
> will
> > > > give
> > > > >>>> acceptable compression rate, but we will have to compress the
> > whole
> > > > >>>> binary
> > > > >>>> object on every field access, what may ruin our SQL performance.
> > > > Second
> > > > >>>> approach is more complex, we are not sure about it's compression
> > > rate,
> > > > >>>> but
> > > > >>>> as BinaryObject structure is preserved, we will still have fast
> > > > >>>> constant-time per-field access.
> > > > >>>>
> > > > >>> I think there are advantages in both approaches and we will be
> able
> > > to
> > > > >> compare different approaches and algorithms after prototype
> > > > >> implementation.
> > > > >>
> > > > >> Main approach in brief:
> > > > >> 1) When page’s free space drops below 20% will be triggered
> > > compression
> > > > >> event
> > > > >> 2) Page will be locked by write lock
> > > > >> 3) Page will be passed to page’s compressor implementation
> > > > >> 4) Page will be replaced by compressed page
> > > > >>
> > > > >> Whole object or a field reading:
> > > > >> 1) If page marked as compressed then the page will be handled by
> > > > >> page’s compressor implementation, otherwise, it will be handled as
> > > > >> usual.
> > > > >>
> > > > >> Thoughts?
> > > > >>
> > > > >> Should we create new IEP and register tickets to start
> > implementation?
> > > > >> This will allow us to watch for the feature progress and related
> > > > >> tasks.
> > > > >>
> > > > >>
> > > > >> [1] http://apache-ignite-developers.2346864.n4.nabble.com/Data-
> > > > >> compression-in-Ignite-tc20679.html
> > > > >>
> > > > >>
> > > > >>
> > > > > --
> > > > > Taras Ledkov
> > > > > Mail-To: tledkov@gridgain.com
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: Data compression design proposal

Posted by Dmitriy Setrakyan <ds...@apache.org>.

AG,

I would also ask about the compression itself. How and where do we store
the compression meta information? We cannot be compressing every page
separately, it will not be effective. However, if we try to store the
compression metadata, how do we make other nodes aware of it? Has this been
discussed?

D.

On Mon, Mar 26, 2018 at 8:53 AM, Alexey Goncharuk <
alexey.goncharuk@gmail.com> wrote:

> Guys,
>
> How does this fit the PageMemory concept? Currently it assumes that the
> size of the page in memory and the size of the page on disk is the same, so
> only per-entry level compression within a page makes sense.
>
> If you compress a whole page, how do you calculate the page offset in the
> target data file?
>
> --AG
>
> 2018-03-26 17:39 GMT+03:00 Vladimir Ozerov <vo...@gridgain.com>:
>
> > Gents,
> >
> > If I understood the idea correctly, the proposal is to compress pages on
> > eviction and decompress them on read from disk. Is it correct?
> >
> > On Mon, Mar 26, 2018 at 5:13 PM, Anton Vinogradov <av...@apache.org> wrote:
> >
> > > + 1 to Taras's vision.
> > >
> > > Compression on eviction is a good case to store more.
> > > Pages at memory always hot a real system, so complession in memory will
> > > definetely slowdown the system, I think.
> > >
> > > Anyway, we can split issue to "on eviction compression" and to
> "in-memory
> > > compression".
> > >
> > >
> > > 2018-03-06 12:14 GMT+03:00 Taras Ledkov <tl...@gridgain.com>:
> > >
> > > > Hi,
> > > >
> > > > I guess page level compression make sense on page loading / eviction.
> > > > In this case we can decrease I/O operation and performance boost can
> be
> > > > reached.
> > > > What is goal for in-memory compression? Holds about 2-5x data in
> memory
> > > > with performance drop?
> > > >
> > > > Also please clarify the case with compression/decompression for hot
> and
> > > > cold pages.
> > > > Is it right for your approach:
> > > > 1. Hot pages are always decompressed in memory because many
> read/write
> > > > operations touch ones.
> > > > 2. So we can compress only cold pages.
> > > >
> > > > So the way is suitable when the hot data size << available RAM size.
> > > >
> > > > Thoughts?
> > > >
> > > >
> > > > On 05.03.2018 20:18, Vyacheslav Daradur wrote:
> > > >
> > > >> Hi Igniters!
> > > >>
> > > >> I’d like to do next step in our data compression discussion [1].
> > > >>
> > > >> Most Igniters vote for per-data-page compression.
> > > >>
> > > >> I’d like to accumulate  main theses to start implementation:
> > > >> - page will be compressed with the dictionary-based approach
> (e.g.LZV)
> > > >> - page will be compressed in batch mode (not on every change)
> > > >> - page compression should been initiated by an event, for example, a
> > > >> page’s free space drops below 20%
> > > >> - compression process will be under page write lock
> > > >>
> > > >> Vladimir Ozerov has written:
> > > >>
> > > >>> What we do not understand yet:
> > > >>>> 1) Granularity of compression algorithm.
> > > >>>> 1.1) It could be per-entry - i.e. we compress the whole entry
> > content,
> > > >>>> but
> > > >>>> respect boundaries between entries. E.g.: before -
> > [ENTRY_1][ENTRY_2],
> > > >>>> after - [COMPRESSED_ENTRY_1][COMPRESSED_ENTRY_2] (as opposed to
> > > >>>> [COMPRESSED ENTRY_1 and ENTRY_2]).
> > > >>>> v1.2) Or it could be per-field - i.e. we compress fields, but
> > respect
> > > >>>> binary
> > > >>>> object layout. First approach is simple, straightforward, and will
> > > give
> > > >>>> acceptable compression rate, but we will have to compress the
> whole
> > > >>>> binary
> > > >>>> object on every field access, what may ruin our SQL performance.
> > > Second
> > > >>>> approach is more complex, we are not sure about it's compression
> > rate,
> > > >>>> but
> > > >>>> as BinaryObject structure is preserved, we will still have fast
> > > >>>> constant-time per-field access.
> > > >>>>
> > > >>> I think there are advantages in both approaches and we will be able
> > to
> > > >> compare different approaches and algorithms after prototype
> > > >> implementation.
> > > >>
> > > >> Main approach in brief:
> > > >> 1) When page’s free space drops below 20% will be triggered
> > compression
> > > >> event
> > > >> 2) Page will be locked by write lock
> > > >> 3) Page will be passed to page’s compressor implementation
> > > >> 4) Page will be replaced by compressed page
> > > >>
> > > >> Whole object or a field reading:
> > > >> 1) If page marked as compressed then the page will be handled by
> > > >> page’s compressor implementation, otherwise, it will be handled as
> > > >> usual.
> > > >>
> > > >> Thoughts?
> > > >>
> > > >> Should we create new IEP and register tickets to start
> implementation?
> > > >> This will allow us to watch for the feature progress and related
> > > >> tasks.
> > > >>
> > > >>
> > > >> [1] http://apache-ignite-developers.2346864.n4.nabble.com/Data-
> > > >> compression-in-Ignite-tc20679.html
> > > >>
> > > >>
> > > >>
> > > > --
> > > > Taras Ledkov
> > > > Mail-To: tledkov@gridgain.com
> > > >
> > > >
> > >
> >
>

Re: Data compression design proposal

Posted by Alexey Goncharuk <al...@gmail.com>.

Guys,

How does this fit the PageMemory concept? Currently it assumes that the
size of the page in memory and the size of the page on disk is the same, so
only per-entry level compression within a page makes sense.

If you compress a whole page, how do you calculate the page offset in the
target data file?

--AG

2018-03-26 17:39 GMT+03:00 Vladimir Ozerov <vo...@gridgain.com>:

> Gents,
>
> If I understood the idea correctly, the proposal is to compress pages on
> eviction and decompress them on read from disk. Is it correct?
>
> On Mon, Mar 26, 2018 at 5:13 PM, Anton Vinogradov <av...@apache.org> wrote:
>
> > + 1 to Taras's vision.
> >
> > Compression on eviction is a good case to store more.
> > Pages at memory always hot a real system, so complession in memory will
> > definetely slowdown the system, I think.
> >
> > Anyway, we can split issue to "on eviction compression" and to "in-memory
> > compression".
> >
> >
> > 2018-03-06 12:14 GMT+03:00 Taras Ledkov <tl...@gridgain.com>:
> >
> > > Hi,
> > >
> > > I guess page level compression make sense on page loading / eviction.
> > > In this case we can decrease I/O operation and performance boost can be
> > > reached.
> > > What is goal for in-memory compression? Holds about 2-5x data in memory
> > > with performance drop?
> > >
> > > Also please clarify the case with compression/decompression for hot and
> > > cold pages.
> > > Is it right for your approach:
> > > 1. Hot pages are always decompressed in memory because many read/write
> > > operations touch ones.
> > > 2. So we can compress only cold pages.
> > >
> > > So the way is suitable when the hot data size << available RAM size.
> > >
> > > Thoughts?
> > >
> > >
> > > On 05.03.2018 20:18, Vyacheslav Daradur wrote:
> > >
> > >> Hi Igniters!
> > >>
> > >> I’d like to do next step in our data compression discussion [1].
> > >>
> > >> Most Igniters vote for per-data-page compression.
> > >>
> > >> I’d like to accumulate  main theses to start implementation:
> > >> - page will be compressed with the dictionary-based approach (e.g.LZV)
> > >> - page will be compressed in batch mode (not on every change)
> > >> - page compression should been initiated by an event, for example, a
> > >> page’s free space drops below 20%
> > >> - compression process will be under page write lock
> > >>
> > >> Vladimir Ozerov has written:
> > >>
> > >>> What we do not understand yet:
> > >>>> 1) Granularity of compression algorithm.
> > >>>> 1.1) It could be per-entry - i.e. we compress the whole entry
> content,
> > >>>> but
> > >>>> respect boundaries between entries. E.g.: before -
> [ENTRY_1][ENTRY_2],
> > >>>> after - [COMPRESSED_ENTRY_1][COMPRESSED_ENTRY_2] (as opposed to
> > >>>> [COMPRESSED ENTRY_1 and ENTRY_2]).
> > >>>> v1.2) Or it could be per-field - i.e. we compress fields, but
> respect
> > >>>> binary
> > >>>> object layout. First approach is simple, straightforward, and will
> > give
> > >>>> acceptable compression rate, but we will have to compress the whole
> > >>>> binary
> > >>>> object on every field access, what may ruin our SQL performance.
> > Second
> > >>>> approach is more complex, we are not sure about it's compression
> rate,
> > >>>> but
> > >>>> as BinaryObject structure is preserved, we will still have fast
> > >>>> constant-time per-field access.
> > >>>>
> > >>> I think there are advantages in both approaches and we will be able
> to
> > >> compare different approaches and algorithms after prototype
> > >> implementation.
> > >>
> > >> Main approach in brief:
> > >> 1) When page’s free space drops below 20% will be triggered
> compression
> > >> event
> > >> 2) Page will be locked by write lock
> > >> 3) Page will be passed to page’s compressor implementation
> > >> 4) Page will be replaced by compressed page
> > >>
> > >> Whole object or a field reading:
> > >> 1) If page marked as compressed then the page will be handled by
> > >> page’s compressor implementation, otherwise, it will be handled as
> > >> usual.
> > >>
> > >> Thoughts?
> > >>
> > >> Should we create new IEP and register tickets to start implementation?
> > >> This will allow us to watch for the feature progress and related
> > >> tasks.
> > >>
> > >>
> > >> [1] http://apache-ignite-developers.2346864.n4.nabble.com/Data-
> > >> compression-in-Ignite-tc20679.html
> > >>
> > >>
> > >>
> > > --
> > > Taras Ledkov
> > > Mail-To: tledkov@gridgain.com
> > >
> > >
> >
>

Re: Data compression design proposal

Posted by Vladimir Ozerov <vo...@gridgain.com>.

Gents,

If I understood the idea correctly, the proposal is to compress pages on
eviction and decompress them on read from disk. Is it correct?

On Mon, Mar 26, 2018 at 5:13 PM, Anton Vinogradov <av...@apache.org> wrote:

> + 1 to Taras's vision.
>
> Compression on eviction is a good case to store more.
> Pages at memory always hot a real system, so complession in memory will
> definetely slowdown the system, I think.
>
> Anyway, we can split issue to "on eviction compression" and to "in-memory
> compression".
>
>
> 2018-03-06 12:14 GMT+03:00 Taras Ledkov <tl...@gridgain.com>:
>
> > Hi,
> >
> > I guess page level compression make sense on page loading / eviction.
> > In this case we can decrease I/O operation and performance boost can be
> > reached.
> > What is goal for in-memory compression? Holds about 2-5x data in memory
> > with performance drop?
> >
> > Also please clarify the case with compression/decompression for hot and
> > cold pages.
> > Is it right for your approach:
> > 1. Hot pages are always decompressed in memory because many read/write
> > operations touch ones.
> > 2. So we can compress only cold pages.
> >
> > So the way is suitable when the hot data size << available RAM size.
> >
> > Thoughts?
> >
> >
> > On 05.03.2018 20:18, Vyacheslav Daradur wrote:
> >
> >> Hi Igniters!
> >>
> >> I’d like to do next step in our data compression discussion [1].
> >>
> >> Most Igniters vote for per-data-page compression.
> >>
> >> I’d like to accumulate  main theses to start implementation:
> >> - page will be compressed with the dictionary-based approach (e.g.LZV)
> >> - page will be compressed in batch mode (not on every change)
> >> - page compression should been initiated by an event, for example, a
> >> page’s free space drops below 20%
> >> - compression process will be under page write lock
> >>
> >> Vladimir Ozerov has written:
> >>
> >>> What we do not understand yet:
> >>>> 1) Granularity of compression algorithm.
> >>>> 1.1) It could be per-entry - i.e. we compress the whole entry content,
> >>>> but
> >>>> respect boundaries between entries. E.g.: before - [ENTRY_1][ENTRY_2],
> >>>> after - [COMPRESSED_ENTRY_1][COMPRESSED_ENTRY_2] (as opposed to
> >>>> [COMPRESSED ENTRY_1 and ENTRY_2]).
> >>>> v1.2) Or it could be per-field - i.e. we compress fields, but respect
> >>>> binary
> >>>> object layout. First approach is simple, straightforward, and will
> give
> >>>> acceptable compression rate, but we will have to compress the whole
> >>>> binary
> >>>> object on every field access, what may ruin our SQL performance.
> Second
> >>>> approach is more complex, we are not sure about it's compression rate,
> >>>> but
> >>>> as BinaryObject structure is preserved, we will still have fast
> >>>> constant-time per-field access.
> >>>>
> >>> I think there are advantages in both approaches and we will be able to
> >> compare different approaches and algorithms after prototype
> >> implementation.
> >>
> >> Main approach in brief:
> >> 1) When page’s free space drops below 20% will be triggered compression
> >> event
> >> 2) Page will be locked by write lock
> >> 3) Page will be passed to page’s compressor implementation
> >> 4) Page will be replaced by compressed page
> >>
> >> Whole object or a field reading:
> >> 1) If page marked as compressed then the page will be handled by
> >> page’s compressor implementation, otherwise, it will be handled as
> >> usual.
> >>
> >> Thoughts?
> >>
> >> Should we create new IEP and register tickets to start implementation?
> >> This will allow us to watch for the feature progress and related
> >> tasks.
> >>
> >>
> >> [1] http://apache-ignite-developers.2346864.n4.nabble.com/Data-
> >> compression-in-Ignite-tc20679.html
> >>
> >>
> >>
> > --
> > Taras Ledkov
> > Mail-To: tledkov@gridgain.com
> >
> >
>

Re: Data compression design proposal

Posted by Anton Vinogradov <av...@apache.org>.

+ 1 to Taras's vision.

Compression on eviction is a good case to store more.
Pages at memory always hot a real system, so complession in memory will
definetely slowdown the system, I think.

Anyway, we can split issue to "on eviction compression" and to "in-memory
compression".


2018-03-06 12:14 GMT+03:00 Taras Ledkov <tl...@gridgain.com>:

> Hi,
>
> I guess page level compression make sense on page loading / eviction.
> In this case we can decrease I/O operation and performance boost can be
> reached.
> What is goal for in-memory compression? Holds about 2-5x data in memory
> with performance drop?
>
> Also please clarify the case with compression/decompression for hot and
> cold pages.
> Is it right for your approach:
> 1. Hot pages are always decompressed in memory because many read/write
> operations touch ones.
> 2. So we can compress only cold pages.
>
> So the way is suitable when the hot data size << available RAM size.
>
> Thoughts?
>
>
> On 05.03.2018 20:18, Vyacheslav Daradur wrote:
>
>> Hi Igniters!
>>
>> I’d like to do next step in our data compression discussion [1].
>>
>> Most Igniters vote for per-data-page compression.
>>
>> I’d like to accumulate  main theses to start implementation:
>> - page will be compressed with the dictionary-based approach (e.g.LZV)
>> - page will be compressed in batch mode (not on every change)
>> - page compression should been initiated by an event, for example, a
>> page’s free space drops below 20%
>> - compression process will be under page write lock
>>
>> Vladimir Ozerov has written:
>>
>>> What we do not understand yet:
>>>> 1) Granularity of compression algorithm.
>>>> 1.1) It could be per-entry - i.e. we compress the whole entry content,
>>>> but
>>>> respect boundaries between entries. E.g.: before - [ENTRY_1][ENTRY_2],
>>>> after - [COMPRESSED_ENTRY_1][COMPRESSED_ENTRY_2] (as opposed to
>>>> [COMPRESSED ENTRY_1 and ENTRY_2]).
>>>> v1.2) Or it could be per-field - i.e. we compress fields, but respect
>>>> binary
>>>> object layout. First approach is simple, straightforward, and will give
>>>> acceptable compression rate, but we will have to compress the whole
>>>> binary
>>>> object on every field access, what may ruin our SQL performance. Second
>>>> approach is more complex, we are not sure about it's compression rate,
>>>> but
>>>> as BinaryObject structure is preserved, we will still have fast
>>>> constant-time per-field access.
>>>>
>>> I think there are advantages in both approaches and we will be able to
>> compare different approaches and algorithms after prototype
>> implementation.
>>
>> Main approach in brief:
>> 1) When page’s free space drops below 20% will be triggered compression
>> event
>> 2) Page will be locked by write lock
>> 3) Page will be passed to page’s compressor implementation
>> 4) Page will be replaced by compressed page
>>
>> Whole object or a field reading:
>> 1) If page marked as compressed then the page will be handled by
>> page’s compressor implementation, otherwise, it will be handled as
>> usual.
>>
>> Thoughts?
>>
>> Should we create new IEP and register tickets to start implementation?
>> This will allow us to watch for the feature progress and related
>> tasks.
>>
>>
>> [1] http://apache-ignite-developers.2346864.n4.nabble.com/Data-
>> compression-in-Ignite-tc20679.html
>>
>>
>>
> --
> Taras Ledkov
> Mail-To: tledkov@gridgain.com
>
>

Re: Data compression design proposal

Posted by Taras Ledkov <tl...@gridgain.com>.

Hi,

I guess page level compression make sense on page loading / eviction.
In this case we can decrease I/O operation and performance boost can be 
reached.
What is goal for in-memory compression? Holds about 2-5x data in memory 
with performance drop?

Also please clarify the case with compression/decompression for hot and 
cold pages.
Is it right for your approach:
1. Hot pages are always decompressed in memory because many read/write 
operations touch ones.
2. So we can compress only cold pages.

So the way is suitable when the hot data size << available RAM size.

Thoughts?

On 05.03.2018 20:18, Vyacheslav Daradur wrote:
> Hi Igniters!
>
> I’d like to do next step in our data compression discussion [1].
>
> Most Igniters vote for per-data-page compression.
>
> I’d like to accumulate  main theses to start implementation:
> - page will be compressed with the dictionary-based approach (e.g.LZV)
> - page will be compressed in batch mode (not on every change)
> - page compression should been initiated by an event, for example, a
> page’s free space drops below 20%
> - compression process will be under page write lock
>
> Vladimir Ozerov has written:
>>> What we do not understand yet:
>>> 1) Granularity of compression algorithm.
>>> 1.1) It could be per-entry - i.e. we compress the whole entry content, but
>>> respect boundaries between entries. E.g.: before - [ENTRY_1][ENTRY_2],
>>> after - [COMPRESSED_ENTRY_1][COMPRESSED_ENTRY_2] (as opposed to [COMPRESSED ENTRY_1 and ENTRY_2]).
>>> v1.2) Or it could be per-field - i.e. we compress fields, but respect binary
>>> object layout. First approach is simple, straightforward, and will give
>>> acceptable compression rate, but we will have to compress the whole binary
>>> object on every field access, what may ruin our SQL performance. Second
>>> approach is more complex, we are not sure about it's compression rate, but
>>> as BinaryObject structure is preserved, we will still have fast
>>> constant-time per-field access.
> I think there are advantages in both approaches and we will be able to
> compare different approaches and algorithms after prototype
> implementation.
>
> Main approach in brief:
> 1) When page’s free space drops below 20% will be triggered compression event
> 2) Page will be locked by write lock
> 3) Page will be passed to page’s compressor implementation
> 4) Page will be replaced by compressed page
>
> Whole object or a field reading:
> 1) If page marked as compressed then the page will be handled by
> page’s compressor implementation, otherwise, it will be handled as
> usual.
>
> Thoughts?
>
> Should we create new IEP and register tickets to start implementation?
> This will allow us to watch for the feature progress and related
> tasks.
>
>
> [1] http://apache-ignite-developers.2346864.n4.nabble.com/Data-compression-in-Ignite-tc20679.html
>
>

-- 
Taras Ledkov
Mail-To: tledkov@gridgain.com