You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ignite.apache.org by daradurvs <da...@gmail.com> on 2017/10/18 15:52:30 UTC

Re: Data compression in Ignite

Hi, Igniters!

Are there any results of researching or a prototype of compression feature?



--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/

Re: Data compression in Ignite

Posted by Vladimir Ozerov <vo...@gridgain.com>.
Dmitry,

Ignite is used by a variety of applications. Some models I saw were made
completely of stirngs. Others - of longs and decimals, etc.. It is
impossible to either prove or disprove what is the dominant data type. My
position is based on experience with Ignite users and approaches used in
other databases.

Strings are more complex because you approach assumes that there is a
common dictionary with strings, and reference to these strings from data
pages. As soon as you have cross-page references, you are in trobule,
because you need to maintain that dictionary. WIth page based approach we
agreed previously, the dictionary is generic (i.e. it can compress not only
strings, but any byte sequence), and is located inside the page, meaning
that all you need to maintain this dictionary is page lock.

On Fri, Nov 10, 2017 at 7:02 PM, Dmitry Pavlov <dp...@gmail.com>
wrote:

> Hi Vladimir,
>
> To my experience string is often used data type in business applications
> and moreover, indexed.
> > String type doesn't dominate in user models
> what is the basis of this assumption?
>
> Could you explain why String is more complex than byte[] compression. It
> seems they both requires dictionaries.
>
> Sincerely,
> Dmitriy Pavlov
>
> пт, 10 нояб. 2017 г. в 18:57, Vladimir Ozerov <vo...@gridgain.com>:
>
> > This would require shared dictionary, which is complex to maintain. We
> > evaluated this option, but rejected due to complexity. Another important
> > thing is that String type doesn't dominate in user models, so I do not
> see
> > why it should be a kind of special case.
> >
> > пт, 10 нояб. 2017 г. в 18:45, Dmitry Pavlov <dp...@gmail.com>:
> >
> > > Vladimir,
> > >
> > > orientation on string will also allow us to deduplicate strings in
> > objects
> > > during unmarshalling from page into heap.
> > >
> > > Moreover, this can be first simple step of implementating more complex
> > > algorithm.
> > >
> > > Sincerely,
> > > Dmitriy Pavlov
> > >
> > > пт, 10 нояб. 2017 г. в 18:19, Vladimir Ozerov <vo...@gridgain.com>:
> > >
> > > > Dmitry,
> > > >
> > > > What we've discussed so far in this topic is essentially the same
> > > concept.
> > > > We will deduplicate same byte sequences on page level.
> > > >
> > > > On Fri, Nov 10, 2017 at 6:10 PM, Dmitry Pavlov <
> dpavlov.spb@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Igniters,
> > > > >
> > > > > What do you think about implementing analogue of Java G1 collector
> > > featue
> > > > > 'String deduplication': -XX:+UseG1GC -XX:+UseStringDeduplication
> > > > >
> > > > > Most of business application has almost all objects of type String.
> > As
> > > > > result char[] array is often on top of heap usage. To reduce
> > > consumption
> > > > by
> > > > > duplicates G1 collector in background identifies and deduplicates
> > > strings
> > > > > having equal array into one instance (as String is immutable).
> > > > > Unfortunately we can’t reuse collector’s feature as Ignite stores
> > data
> > > > > off-heap.
> > > > >
> > > > > What if we consider implementing same deduplication feature for
> > Ignite
> > > > > Durable Memory?
> > > > >
> > > > > Sincerely,
> > > > > Dmitry Pavlov
> > > > >
> > > > >
> > > > > ср, 18 окт. 2017 г. в 18:52, daradurvs <da...@gmail.com>:
> > > > >
> > > > > > Hi, Igniters!
> > > > > >
> > > > > > Are there any results of researching or a prototype of
> compression
> > > > > feature?
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Sent from: http://apache-ignite-developers.2346864.n4.nabble.
> com/
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Data compression in Ignite

Posted by Dmitry Pavlov <dp...@gmail.com>.
Hi Vladimir,

To my experience string is often used data type in business applications
and moreover, indexed.
> String type doesn't dominate in user models
what is the basis of this assumption?

Could you explain why String is more complex than byte[] compression. It
seems they both requires dictionaries.

Sincerely,
Dmitriy Pavlov

пт, 10 нояб. 2017 г. в 18:57, Vladimir Ozerov <vo...@gridgain.com>:

> This would require shared dictionary, which is complex to maintain. We
> evaluated this option, but rejected due to complexity. Another important
> thing is that String type doesn't dominate in user models, so I do not see
> why it should be a kind of special case.
>
> пт, 10 нояб. 2017 г. в 18:45, Dmitry Pavlov <dp...@gmail.com>:
>
> > Vladimir,
> >
> > orientation on string will also allow us to deduplicate strings in
> objects
> > during unmarshalling from page into heap.
> >
> > Moreover, this can be first simple step of implementating more complex
> > algorithm.
> >
> > Sincerely,
> > Dmitriy Pavlov
> >
> > пт, 10 нояб. 2017 г. в 18:19, Vladimir Ozerov <vo...@gridgain.com>:
> >
> > > Dmitry,
> > >
> > > What we've discussed so far in this topic is essentially the same
> > concept.
> > > We will deduplicate same byte sequences on page level.
> > >
> > > On Fri, Nov 10, 2017 at 6:10 PM, Dmitry Pavlov <dp...@gmail.com>
> > > wrote:
> > >
> > > > Hi Igniters,
> > > >
> > > > What do you think about implementing analogue of Java G1 collector
> > featue
> > > > 'String deduplication': -XX:+UseG1GC -XX:+UseStringDeduplication
> > > >
> > > > Most of business application has almost all objects of type String.
> As
> > > > result char[] array is often on top of heap usage. To reduce
> > consumption
> > > by
> > > > duplicates G1 collector in background identifies and deduplicates
> > strings
> > > > having equal array into one instance (as String is immutable).
> > > > Unfortunately we can’t reuse collector’s feature as Ignite stores
> data
> > > > off-heap.
> > > >
> > > > What if we consider implementing same deduplication feature for
> Ignite
> > > > Durable Memory?
> > > >
> > > > Sincerely,
> > > > Dmitry Pavlov
> > > >
> > > >
> > > > ср, 18 окт. 2017 г. в 18:52, daradurvs <da...@gmail.com>:
> > > >
> > > > > Hi, Igniters!
> > > > >
> > > > > Are there any results of researching or a prototype of compression
> > > > feature?
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
> > > > >
> > > >
> > >
> >
>

Re: Data compression in Ignite

Posted by Vladimir Ozerov <vo...@gridgain.com>.
This would require shared dictionary, which is complex to maintain. We
evaluated this option, but rejected due to complexity. Another important
thing is that String type doesn't dominate in user models, so I do not see
why it should be a kind of special case.

пт, 10 нояб. 2017 г. в 18:45, Dmitry Pavlov <dp...@gmail.com>:

> Vladimir,
>
> orientation on string will also allow us to deduplicate strings in objects
> during unmarshalling from page into heap.
>
> Moreover, this can be first simple step of implementating more complex
> algorithm.
>
> Sincerely,
> Dmitriy Pavlov
>
> пт, 10 нояб. 2017 г. в 18:19, Vladimir Ozerov <vo...@gridgain.com>:
>
> > Dmitry,
> >
> > What we've discussed so far in this topic is essentially the same
> concept.
> > We will deduplicate same byte sequences on page level.
> >
> > On Fri, Nov 10, 2017 at 6:10 PM, Dmitry Pavlov <dp...@gmail.com>
> > wrote:
> >
> > > Hi Igniters,
> > >
> > > What do you think about implementing analogue of Java G1 collector
> featue
> > > 'String deduplication': -XX:+UseG1GC -XX:+UseStringDeduplication
> > >
> > > Most of business application has almost all objects of type String. As
> > > result char[] array is often on top of heap usage. To reduce
> consumption
> > by
> > > duplicates G1 collector in background identifies and deduplicates
> strings
> > > having equal array into one instance (as String is immutable).
> > > Unfortunately we can’t reuse collector’s feature as Ignite stores data
> > > off-heap.
> > >
> > > What if we consider implementing same deduplication feature for Ignite
> > > Durable Memory?
> > >
> > > Sincerely,
> > > Dmitry Pavlov
> > >
> > >
> > > ср, 18 окт. 2017 г. в 18:52, daradurvs <da...@gmail.com>:
> > >
> > > > Hi, Igniters!
> > > >
> > > > Are there any results of researching or a prototype of compression
> > > feature?
> > > >
> > > >
> > > >
> > > > --
> > > > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
> > > >
> > >
> >
>

Re: Data compression in Ignite

Posted by Dmitry Pavlov <dp...@gmail.com>.
Vladimir,

orientation on string will also allow us to deduplicate strings in objects
during unmarshalling from page into heap.

Moreover, this can be first simple step of implementating more complex
algorithm.

Sincerely,
Dmitriy Pavlov

пт, 10 нояб. 2017 г. в 18:19, Vladimir Ozerov <vo...@gridgain.com>:

> Dmitry,
>
> What we've discussed so far in this topic is essentially the same concept.
> We will deduplicate same byte sequences on page level.
>
> On Fri, Nov 10, 2017 at 6:10 PM, Dmitry Pavlov <dp...@gmail.com>
> wrote:
>
> > Hi Igniters,
> >
> > What do you think about implementing analogue of Java G1 collector featue
> > 'String deduplication': -XX:+UseG1GC -XX:+UseStringDeduplication
> >
> > Most of business application has almost all objects of type String. As
> > result char[] array is often on top of heap usage. To reduce consumption
> by
> > duplicates G1 collector in background identifies and deduplicates strings
> > having equal array into one instance (as String is immutable).
> > Unfortunately we can’t reuse collector’s feature as Ignite stores data
> > off-heap.
> >
> > What if we consider implementing same deduplication feature for Ignite
> > Durable Memory?
> >
> > Sincerely,
> > Dmitry Pavlov
> >
> >
> > ср, 18 окт. 2017 г. в 18:52, daradurvs <da...@gmail.com>:
> >
> > > Hi, Igniters!
> > >
> > > Are there any results of researching or a prototype of compression
> > feature?
> > >
> > >
> > >
> > > --
> > > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
> > >
> >
>

Re: Data compression in Ignite

Posted by Vladimir Ozerov <vo...@gridgain.com>.
Dmitry,

What we've discussed so far in this topic is essentially the same concept.
We will deduplicate same byte sequences on page level.

On Fri, Nov 10, 2017 at 6:10 PM, Dmitry Pavlov <dp...@gmail.com>
wrote:

> Hi Igniters,
>
> What do you think about implementing analogue of Java G1 collector featue
> 'String deduplication': -XX:+UseG1GC -XX:+UseStringDeduplication
>
> Most of business application has almost all objects of type String. As
> result char[] array is often on top of heap usage. To reduce consumption by
> duplicates G1 collector in background identifies and deduplicates strings
> having equal array into one instance (as String is immutable).
> Unfortunately we can’t reuse collector’s feature as Ignite stores data
> off-heap.
>
> What if we consider implementing same deduplication feature for Ignite
> Durable Memory?
>
> Sincerely,
> Dmitry Pavlov
>
>
> ср, 18 окт. 2017 г. в 18:52, daradurvs <da...@gmail.com>:
>
> > Hi, Igniters!
> >
> > Are there any results of researching or a prototype of compression
> feature?
> >
> >
> >
> > --
> > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
> >
>

Re: Data compression in Ignite

Posted by Dmitry Pavlov <dp...@gmail.com>.
Hi Igniters,

What do you think about implementing analogue of Java G1 collector featue
'String deduplication': -XX:+UseG1GC -XX:+UseStringDeduplication

Most of business application has almost all objects of type String. As
result char[] array is often on top of heap usage. To reduce consumption by
duplicates G1 collector in background identifies and deduplicates strings
having equal array into one instance (as String is immutable).
Unfortunately we can’t reuse collector’s feature as Ignite stores data
off-heap.

What if we consider implementing same deduplication feature for Ignite
Durable Memory?

Sincerely,
Dmitry Pavlov


ср, 18 окт. 2017 г. в 18:52, daradurvs <da...@gmail.com>:

> Hi, Igniters!
>
> Are there any results of researching or a prototype of compression feature?
>
>
>
> --
> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
>