You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ignite.apache.org by Vladimir Ozerov <vo...@gridgain.com> on 2017/08/01 09:15:28 UTC

Re: Data compression in Ignite 2.0

Vyacheslav,

This is not about my needs, but about the product :-) BinaryObject is a
central entity used for both data transfer and data storage. This is both
good and bad at the same time.

Good thing is that as we optimize binary protocol, we improve both network
and storage performance at the same time. We have at least 3 things which
will be included into the product soon: varint encoding [1], optimized
string encoding [2] and null-field optimization [3]. Bad thing is that
binary object format is not well suited for data storage optimizations,
including compression. For example, one good compression technique is to
organize data in column-store format, or to introduce shared "dictionary"
with unique values on cache level. In both cases N equal values are not
stored N times. Instead, we store one value and N references to it, or so.
This way 2x-10x compression is possible depending on workload type. Binary
object protocol with some compression on top of it cannot give such
improvement, because it will compress data in individual objects, instead
of compressing the whole cache data in a single context.

That said, I propose to give up adding compression to BinaryObject. This is
a dead end. Instead, we should:
1) Optimize protocol itself to be more compact, as described in
aforementioned Ignite tickets
2) Start new discussion about storage compression

You can read papers of other vendors to get better understanding on
possible compression options. E.g. Oracle has a lot of compression
techniques, including heat maps, background compression, per-block
compression, data dictionaries, etc. [4].

[1] https://issues.apache.org/jira/browse/IGNITE-5097
[2] https://issues.apache.org/jira/browse/IGNITE-5655
[3] https://issues.apache.org/jira/browse/IGNITE-3939
[4] http://www.oracle.com/technetwork/database/options/compression/advanced-
compression-wp-12c-1896128.pdf

Vladimir.


On Tue, Jul 11, 2017 at 6:56 PM, Vyacheslav Daradur <da...@gmail.com>
wrote:

> Hi Igniters!
>
> I'd like to continue developing and discussing about compression in Ignite.
>
> Vladimir, could you propose a design of compression feature in Ignite,
> that suits you?
>
> 2017-06-15 16:13 GMT+03:00 Vyacheslav Daradur <da...@gmail.com>:
>
>> Hi Igniters.
>>
>> Vladimir, I want to propose another design of an implementation of the
>> per-field compression.
>>
>> 1) We will add new step in the method prepareForCache (for example) of
>> CacheObject, or in GridCacheMapEntry.
>>
>> At the step, after marshalling of an object, we will compress fields of
>> the object which described in advance.
>> User will describe class fields which he wants to compess in an another
>> entity like Metadata.
>>
>> For compression, we will introduce another entity, for example
>> CompressionProcessor, which will work with bytes array (marshalled object).
>> The entity will read bytes array of described fields, compress it and
>> rewrite binary representation of the whole object.
>> After processing the object will be put in the cache.
>>
>> In this case design not to relate to binary infrastructure.
>> But there is big overhead to heap-memory for the buffer.
>>
>> 2) Another solution is to compress bytes array of whole object on copying
>> to off-heap.
>> But, in this case I don't understand yet, how to provide support of
>> querying and indexing.
>>
>>
>> 2017-06-09 11:21 GMT+03:00 Sergey Kozlov <sk...@gridgain.com>:
>>
>>> Hi
>>>
>>> * "Per-field compression" is applicable for huge BLOB fields and will
>>> impose the restrictions like unable ot index such fields, slower getting
>>> data, potential OOM issues if compression ration is too high.
>>> But for some cases it makes sense
>>>
>>> On Fri, Jun 9, 2017 at 11:11 AM, Антон Чураев <ch...@gmail.com>
>>> wrote:
>>>
>>> > Seems that Dmitry is referring to transparent data encryption. It is
>>> used
>>> > throughout the whale database industry.
>>> >
>>> > 2017-06-09 10:50 GMT+03:00 Vladimir Ozerov <vo...@gridgain.com>:
>>> >
>>> > > Dima,
>>> > >
>>> > > Encryption of certain fields is as bad as compression. First, it is a
>>> > huge
>>> > > change, which makes already complex binary protocol even more
>>> complex.
>>> > > Second, it have to be ported to CPP, .NET platforms, as well as to
>>> JDBC
>>> > and
>>> > > ODBC.
>>> > > Last, but the most important - this is not our headache to encrypt
>>> > > sensitive data. This is user responsibility. Nobody in a sane mind
>>> will
>>> > > store passwords in plain form. Instead, user should encrypt it on his
>>> > own,
>>> > > choosing proper encryption parameters - algorithms, key lengths,
>>> salts,
>>> > > etc.. How are you going to expose this in API or configuration?
>>> > >
>>> > > We should not implement data encryption on binary level, this is out
>>> of
>>> > > question. Encryption should be implemented on application level (user
>>> > > efforts), transport layer (SSL - we already have it), and possibly on
>>> > > disk-level (there are tools for this already).
>>> > >
>>> > >
>>> > > On Fri, Jun 9, 2017 at 9:06 AM, Vyacheslav Daradur <
>>> daradurvs@gmail.com>
>>> > > wrote:
>>> > >
>>> > > > >> which is much less useful.
>>> > > > I note, in some cases there is profit more than twice per size of
>>> an
>>> > > > object.
>>> > > >
>>> > > > >> Would it be possible to change your implementation to handle the
>>> > > > encryption instead?
>>> > > > Yes, of cource, there's not much difference between compression and
>>> > > > encryption, including in my implementation of
>>> per-field-compression.
>>> > > >
>>> > > > 2017-06-09 8:55 GMT+03:00 Dmitriy Setrakyan <dsetrakyan@apache.org
>>> >:
>>> > > >
>>> > > > > Vyacheslav,
>>> > > > >
>>> > > > > When this feature started out as data compression in Ignite, it
>>> > sounded
>>> > > > > very useful. Now it is unfolding as a per-field compression,
>>> which is
>>> > > > much
>>> > > > > less useful. In fact, it is questionable whether it is useful at
>>> all.
>>> > > The
>>> > > > > fact that this feature is implemented does not make it mandatory
>>> for
>>> > > the
>>> > > > > community to accept it.
>>> > > > >
>>> > > > > However, as I mentioned before, per-field encryption is very
>>> useful,
>>> > as
>>> > > > it
>>> > > > > would allow users automatically encrypt certain sensitive fields,
>>> > like
>>> > > > > passwords, credit card numbers, etc. There is not much conceptual
>>> > > > > difference between compressing a field vs encrypting a field.
>>> Would
>>> > it
>>> > > be
>>> > > > > possible to change your implementation to handle the encryption
>>> > > instead?
>>> > > > >
>>> > > > > D.
>>> > > > >
>>> > > > > On Thu, Jun 8, 2017 at 10:42 PM, Vyacheslav Daradur <
>>> > > daradurvs@gmail.com
>>> > > > >
>>> > > > > wrote:
>>> > > > >
>>> > > > > > Guys, I want to be clear:
>>> > > > > > * "Per-field compression" design is the result of a research
>>> of the
>>> > > > > binary
>>> > > > > > infrastructure of Ignite and some other its places (querying,
>>> > > indexing,
>>> > > > > > etc.)
>>> > > > > > * Full-compression of object will be more effective, but in
>>> this
>>> > case
>>> > > > > there
>>> > > > > > is no capability with querying and indexing (or there is large
>>> > > overhead
>>> > > > > by
>>> > > > > > way of decompressing of full object (or caches pages) on
>>> demand)
>>> > > > > > * "Per-field compression" is a one of ways to implement the
>>> > > compression
>>> > > > > > feature
>>> > > > > >
>>> > > > > > I'm new to Ignite also I can be mistaken in some things.
>>> > > > > > Last 3-4 month I've tryed to start dicussion about a design,
>>> but
>>> > > nobody
>>> > > > > > answers nothing (except Dmitry and Valentin who was interested
>>> how
>>> > it
>>> > > > > > works).
>>> > > > > > But I understand that this is community and nobody is obliged
>>> to
>>> > > > anybody.
>>> > > > > >
>>> > > > > > There are strong Ignite experts.
>>> > > > > > If they can help me and community with a design of the
>>> compression
>>> > > > > feature
>>> > > > > > it will be great.
>>> > > > > > At the moment I have a desire and time to be engaged in
>>> development
>>> > > of
>>> > > > > > compression feature in Ignite.
>>> > > > > > Let's use this opportunity :)
>>> > > > > >
>>> > > > > > 2017-06-09 5:36 GMT+03:00 Dmitriy Setrakyan <
>>> dsetrakyan@apache.org
>>> > >:
>>> > > > > >
>>> > > > > > > Igniters,
>>> > > > > > >
>>> > > > > > > I have never seen a single Ignite user asking about
>>> compressing a
>>> > > > > single
>>> > > > > > > field. However, we have had requests to secure certain
>>> fields,
>>> > e.g.
>>> > > > > > > passwords.
>>> > > > > > >
>>> > > > > > > I personally do not think per-field compression is needed,
>>> unless
>>> > > we
>>> > > > > can
>>> > > > > > > point out some concrete real life use cases.
>>> > > > > > >
>>> > > > > > > D.
>>> > > > > > >
>>> > > > > > > On Thu, Jun 8, 2017 at 3:42 AM, Vyacheslav Daradur <
>>> > > > > daradurvs@gmail.com>
>>> > > > > > > wrote:
>>> > > > > > >
>>> > > > > > > > Anton,
>>> > > > > > > >
>>> > > > > > > > >> I thought that if there will storing compressed data in
>>> the
>>> > > > > memory,
>>> > > > > > > data
>>> > > > > > > > >> will transmit over wire in compression too. Is it right?
>>> > > > > > > >
>>> > > > > > > > In per-field compression case - yes.
>>> > > > > > > >
>>> > > > > > > > 2017-06-08 13:36 GMT+03:00 Антон Чураев <
>>> churaev.an@gmail.com
>>> > >:
>>> > > > > > > >
>>> > > > > > > > > Guys, could you please help me.
>>> > > > > > > > > I thought that if there will storing compressed data in
>>> the
>>> > > > memory,
>>> > > > > > > data
>>> > > > > > > > > will transmit over wire in compression too. Is it right?
>>> > > > > > > > >
>>> > > > > > > > > 2017-06-08 13:30 GMT+03:00 Vyacheslav Daradur <
>>> > > > daradurvs@gmail.com
>>> > > > > >:
>>> > > > > > > > >
>>> > > > > > > > > > Vladimir,
>>> > > > > > > > > >
>>> > > > > > > > > > The main problem which I'am trying to solve is storing
>>> data
>>> > > in
>>> > > > > > memory
>>> > > > > > > > in
>>> > > > > > > > > a
>>> > > > > > > > > > compression form via Ignite.
>>> > > > > > > > > > The main goal is using memory more effectivelly.
>>> > > > > > > > > >
>>> > > > > > > > > > >> here the much simpler step would be to full
>>> > > > > > > > > > compression on per-cache basis rather than dealing with
>>> > > > > per-fields
>>> > > > > > > > case.
>>> > > > > > > > > >
>>> > > > > > > > > > Please explain your idea. Compess data by memory-page?
>>> > > > > > > > > > Is it compatible with quering and indexing?
>>> > > > > > > > > >
>>> > > > > > > > > > >> In the end, if user would like to compress
>>> particular
>>> > > field,
>>> > > > > he
>>> > > > > > > can
>>> > > > > > > > > > always to it on his own
>>> > > > > > > > > > I think we mustn't think in this way, if user need
>>> > something
>>> > > he
>>> > > > > > > trying
>>> > > > > > > > to
>>> > > > > > > > > > choose a tool which has this feature OOTB.
>>> > > > > > > > > >
>>> > > > > > > > > >
>>> > > > > > > > > >
>>> > > > > > > > > > 2017-06-08 12:53 GMT+03:00 Vladimir Ozerov <
>>> > > > vozerov@gridgain.com
>>> > > > > >:
>>> > > > > > > > > >
>>> > > > > > > > > > > Igniters,
>>> > > > > > > > > > >
>>> > > > > > > > > > > Honestly I still do not see how to apply it
>>> gracefully
>>> > this
>>> > > > > > feature
>>> > > > > > > > ti
>>> > > > > > > > > > > Ignite. And overall approach to compress only
>>> particular
>>> > > > fields
>>> > > > > > > looks
>>> > > > > > > > > > > overcomplicated to me. Remember, that our main use
>>> case
>>> > is
>>> > > an
>>> > > > > > > > > application
>>> > > > > > > > > > > without classes on the server. It means that any
>>> kind of
>>> > > > > > > annotations
>>> > > > > > > > > are
>>> > > > > > > > > > > inapplicable. To be more precise: proper API should
>>> be
>>> > > > > > implemented
>>> > > > > > > to
>>> > > > > > > > > > > handle no-class case (e.g. how would build such an
>>> object
>>> > > > > through
>>> > > > > > > > > > > BinaryBuilder without a class?), and only then add
>>> > > > annotations
>>> > > > > as
>>> > > > > > > > > > > convenient addition to more basic API.
>>> > > > > > > > > > >
>>> > > > > > > > > > > It seems to me that full implementation, which takes
>>> in
>>> > > count
>>> > > > > > > proper
>>> > > > > > > > > > > "classless" API, changes to binary metadata to
>>> reflect
>>> > > > > compressed
>>> > > > > > > > > fields,
>>> > > > > > > > > > > changes to SQL, changes to binary protocol, and
>>> porting
>>> > to
>>> > > > .NET
>>> > > > > > and
>>> > > > > > > > > CPP,
>>> > > > > > > > > > > will yield very complex solution with little value
>>> to the
>>> > > > > > product.
>>> > > > > > > > > > >
>>> > > > > > > > > > > Instead, as I proposed earlier, it seems that we'd
>>> better
>>> > > > start
>>> > > > > > > with
>>> > > > > > > > > the
>>> > > > > > > > > > > problem we are trying to solve. Basically,
>>> compression
>>> > > could
>>> > > > > help
>>> > > > > > > in
>>> > > > > > > > > two
>>> > > > > > > > > > > cases:
>>> > > > > > > > > > > 1) Transmitting data over wire - it should be
>>> implemented
>>> > > on
>>> > > > > > > > > > communication
>>> > > > > > > > > > > layer and should not affect binary serialization
>>> > component
>>> > > a
>>> > > > > lot.
>>> > > > > > > > > > > 2) Storing data in memory - here the much simpler
>>> step
>>> > > would
>>> > > > be
>>> > > > > > to
>>> > > > > > > > full
>>> > > > > > > > > > > compression on per-cache basis rather than dealing
>>> with
>>> > > > > > per-fields
>>> > > > > > > > > case.
>>> > > > > > > > > > >
>>> > > > > > > > > > > In the end, if user would like to compress particular
>>> > > field,
>>> > > > he
>>> > > > > > can
>>> > > > > > > > > > always
>>> > > > > > > > > > > to it on his own, and set already compressed field
>>> to our
>>> > > > > > > > BinaryObject.
>>> > > > > > > > > > >
>>> > > > > > > > > > > Vladimir.
>>> > > > > > > > > > >
>>> > > > > > > > > > >
>>> > > > > > > > > > > On Thu, Jun 8, 2017 at 12:37 PM, Vyacheslav Daradur <
>>> > > > > > > > > daradurvs@gmail.com
>>> > > > > > > > > > >
>>> > > > > > > > > > > wrote:
>>> > > > > > > > > > >
>>> > > > > > > > > > > > Valentin,
>>> > > > > > > > > > > >
>>> > > > > > > > > > > > Yes, I have the prototype[1][2]
>>> > > > > > > > > > > >
>>> > > > > > > > > > > > You can see an example of Java class[3] that I
>>> used in
>>> > my
>>> > > > > > > > benchmark.
>>> > > > > > > > > > > > For example:
>>> > > > > > > > > > > > class Foo {
>>> > > > > > > > > > > > @BinaryCompression
>>> > > > > > > > > > > > String data;
>>> > > > > > > > > > > > }
>>> > > > > > > > > > > > If user make decision to store the object in
>>> compressed
>>> > > > form,
>>> > > > > > he
>>> > > > > > > > can
>>> > > > > > > > > > use
>>> > > > > > > > > > > > the annotation @BinaryCompression as shown above.
>>> > > > > > > > > > > > It means annotated field 'data' will be compressed
>>> at
>>> > > > > > > marshalling.
>>> > > > > > > > > > > >
>>> > > > > > > > > > > > [1] https://github.com/apache/ignite/pull/1951
>>> > > > > > > > > > > > [2] https://issues.apache.org/jira
>>> /browse/IGNITE-5226
>>> > > > > > > > > > > > [3]
>>> > > > > > > > > > > > https://github.com/daradurvs/i
>>> gnite-compression/blob/
>>> > > > > > > > > > > > master/src/main/java/ru/daradu
>>> rvs/ignite/compression/
>>> > > > > > > > > > model/Audit1F.java
>>> > > > > > > > > > > >
>>> > > > > > > > > > > >
>>> > > > > > > > > > > >
>>> > > > > > > > > > > > 2017-06-08 2:04 GMT+03:00 Valentin Kulichenko <
>>> > > > > > > > > > > > valentin.kulichenko@gmail.com
>>> > > > > > > > > > > > >:
>>> > > > > > > > > > > >
>>> > > > > > > > > > > > > Vyacheslav, Anton,
>>> > > > > > > > > > > > >
>>> > > > > > > > > > > > > Are there any ideas and/or prototypes for the
>>> API?
>>> > Your
>>> > > > > > design
>>> > > > > > > > > > > > suggestions
>>> > > > > > > > > > > > > seem to make sense, but I would like to see how
>>> it
>>> > all
>>> > > > this
>>> > > > > > > will
>>> > > > > > > > > like
>>> > > > > > > > > > > > from
>>> > > > > > > > > > > > > user's standpoint.
>>> > > > > > > > > > > > >
>>> > > > > > > > > > > > > -Val
>>> > > > > > > > > > > > >
>>> > > > > > > > > > > > > On Wed, Jun 7, 2017 at 1:06 AM, Антон Чураев <
>>> > > > > > > > churaev.an@gmail.com
>>> > > > > > > > > >
>>> > > > > > > > > > > > wrote:
>>> > > > > > > > > > > > >
>>> > > > > > > > > > > > > > Vyacheslav, correct me if something wrong
>>> > > > > > > > > > > > > >
>>> > > > > > > > > > > > > > We could provide opportunity of choose between
>>> CPU
>>> > > > usage
>>> > > > > > and
>>> > > > > > > > > > MEM/NET
>>> > > > > > > > > > > > > usage
>>> > > > > > > > > > > > > > for users by compression some attributes of
>>> stored
>>> > > > > objects.
>>> > > > > > > > > > > > > > You have learned design, and it is possible to
>>> > > localize
>>> > > > > > > changes
>>> > > > > > > > > in
>>> > > > > > > > > > > > > > marshalling without performance affect and
>>> current
>>> > > > > > > > functionality.
>>> > > > > > > > > > > > > >
>>> > > > > > > > > > > > > > I think, that it's usefull for our project and
>>> > users.
>>> > > > > > > > > > > > > > Community, what do you think about this
>>> proposal?
>>> > > > > > > > > > > > > >
>>> > > > > > > > > > > > > >
>>> > > > > > > > > > > > > > 2017-06-06 17:29 GMT+03:00 Vyacheslav Daradur <
>>> > > > > > > > > daradurvs@gmail.com
>>> > > > > > > > > > >:
>>> > > > > > > > > > > > > >
>>> > > > > > > > > > > > > > > In short,
>>> > > > > > > > > > > > > > >
>>> > > > > > > > > > > > > > > During marshalling a fields is represented as
>>> > > > > > > > > BinaryFieldAccessor
>>> > > > > > > > > > > > which
>>> > > > > > > > > > > > > > > manages its marshalling. It checks if the
>>> field
>>> > is
>>> > > > > marked
>>> > > > > > > by
>>> > > > > > > > > > > > annotation
>>> > > > > > > > > > > > > > > @BinaryCompression, in that case - binary
>>> > > > > representation
>>> > > > > > > of
>>> > > > > > > > > > field
>>> > > > > > > > > > > > > (bytes
>>> > > > > > > > > > > > > > > array) will be compressed. It will be marked
>>> as
>>> > > > > > compressed
>>> > > > > > > by
>>> > > > > > > > > > types
>>> > > > > > > > > > > > > > > constant (GridBinaryMarshaller.COMPRESSED),
>>> > after
>>> > > > this
>>> > > > > > the
>>> > > > > > > > > > > > compressed
>>> > > > > > > > > > > > > > > bytes
>>> > > > > > > > > > > > > > > array wiil be include in binary
>>> representation of
>>> > > > whole
>>> > > > > > > > object.
>>> > > > > > > > > > > Note,
>>> > > > > > > > > > > > > > > header of marshalled object will not be
>>> > compressed.
>>> > > > > > > > Compression
>>> > > > > > > > > > > > > affected
>>> > > > > > > > > > > > > > > only object's field representation.
>>> > > > > > > > > > > > > > >
>>> > > > > > > > > > > > > > > Objects in IgniteCache is represented as
>>> > > BinaryObject
>>> > > > > > which
>>> > > > > > > > is
>>> > > > > > > > > > > > wrapper
>>> > > > > > > > > > > > > > over
>>> > > > > > > > > > > > > > > bytes array of marshalled object.
>>> > > > > > > > > > > > > > > BinaryObject provides some usefull methods,
>>> which
>>> > > are
>>> > > > > > used
>>> > > > > > > by
>>> > > > > > > > > > > Ignite
>>> > > > > > > > > > > > > > > systems.
>>> > > > > > > > > > > > > > > For example, the Queries use
>>> BinaryObject#field
>>> > > > method,
>>> > > > > > > which
>>> > > > > > > > > > > > > > deserializes
>>> > > > > > > > > > > > > > > only field of object, without deserializing
>>> of
>>> > > whole
>>> > > > > > > object.
>>> > > > > > > > > > > > > > > BinaryObject#field method during
>>> deserialization,
>>> > > if
>>> > > > > > meets
>>> > > > > > > > the
>>> > > > > > > > > > > > constant
>>> > > > > > > > > > > > > > of
>>> > > > > > > > > > > > > > > compressed type, decompress this bytes array,
>>> > then
>>> > > > > > continue
>>> > > > > > > > > > > > > unmarshalling
>>> > > > > > > > > > > > > > > as usual.
>>> > > > > > > > > > > > > > >
>>> > > > > > > > > > > > > > > Now, I introduced the Compressor interface in
>>> > > > > > > > > > IgniteConfigurations,
>>> > > > > > > > > > > > it
>>> > > > > > > > > > > > > > > allows user to use own implementation of
>>> > > compressor -
>>> > > > > it
>>> > > > > > is
>>> > > > > > > > the
>>> > > > > > > > > > > > > > requirement
>>> > > > > > > > > > > > > > > in the task[1].
>>> > > > > > > > > > > > > > >
>>> > > > > > > > > > > > > > > As far as I know, Vladimir Ozerov doesn't
>>> like
>>> > the
>>> > > > idea
>>> > > > > > of
>>> > > > > > > > > > granting
>>> > > > > > > > > > > > > this
>>> > > > > > > > > > > > > > > opportunity to the user.
>>> > > > > > > > > > > > > > > In that case we can choose a compression
>>> > algorithm
>>> > > > > which
>>> > > > > > we
>>> > > > > > > > > will
>>> > > > > > > > > > > > > provide
>>> > > > > > > > > > > > > > by
>>> > > > > > > > > > > > > > > default and will move the interface to
>>> internals
>>> > of
>>> > > > > > binary
>>> > > > > > > > > > > > > > infractructure.
>>> > > > > > > > > > > > > > > For this case I've prepared benchmarked,
>>> which
>>> > I've
>>> > > > > sent
>>> > > > > > > > > earlier.
>>> > > > > > > > > > > > > > >
>>> > > > > > > > > > > > > > > I vote for ZSTD algorithm[2], it provides
>>> good
>>> > > > > > compression
>>> > > > > > > > > ratio
>>> > > > > > > > > > > and
>>> > > > > > > > > > > > > good
>>> > > > > > > > > > > > > > > throughput. It has implementation in Java,
>>> .NET
>>> > and
>>> > > > > C++,
>>> > > > > > > and
>>> > > > > > > > > has
>>> > > > > > > > > > > > > > > ASF-friendly license, we can use it in the
>>> all
>>> > > Ignite
>>> > > > > > > > > platforms.
>>> > > > > > > > > > > > > > > You can look at an assessment of this
>>> algorithm
>>> > in
>>> > > my
>>> > > > > > > > > benchmark's
>>> > > > > > > > > > > > > > >
>>> > > > > > > > > > > > > > > [1] https://issues.apache.org/
>>> > > > jira/browse/IGNITE-3592
>>> > > > > > > > > > > > > > > [2]https://github.com/facebook/zstd
>>> > > > > > > > > > > > > > >
>>> > > > > > > > > > > > > > >
>>> > > > > > > > > > > > > > > 2017-06-06 16:02 GMT+03:00 Антон Чураев <
>>> > > > > > > > churaev.an@gmail.com
>>> > > > > > > > > >:
>>> > > > > > > > > > > > > > >
>>> > > > > > > > > > > > > > > > Looks good for me.
>>> > > > > > > > > > > > > > > >
>>> > > > > > > > > > > > > > > > Could You propose design of implementation
>>> in
>>> > > > couple
>>> > > > > of
>>> > > > > > > > > > > sentences?
>>> > > > > > > > > > > > > > > > So that we can estimate the completeness
>>> and
>>> > > > > complexity
>>> > > > > > > of
>>> > > > > > > > > the
>>> > > > > > > > > > > > > > proposal.
>>> > > > > > > > > > > > > > > >
>>> > > > > > > > > > > > > > > > 2017-06-06 15:26 GMT+03:00 Vyacheslav
>>> Daradur <
>>> > > > > > > > > > > daradurvs@gmail.com
>>> > > > > > > > > > > > >:
>>> > > > > > > > > > > > > > > >
>>> > > > > > > > > > > > > > > > > Anton,
>>> > > > > > > > > > > > > > > > >
>>> > > > > > > > > > > > > > > > > Of course, the solution does not affect
>>> on
>>> > > > existing
>>> > > > > > > > > > > > > implementation. I
>>> > > > > > > > > > > > > > > > mean,
>>> > > > > > > > > > > > > > > > > there is no changes if user not use the
>>> > > > annotation
>>> > > > > > > > > > > > > > @BinaryCompression.
>>> > > > > > > > > > > > > > > > (no
>>> > > > > > > > > > > > > > > > > performance changes)
>>> > > > > > > > > > > > > > > > > Only if user make decision to use
>>> compression
>>> > > on
>>> > > > > > > specific
>>> > > > > > > > > > field
>>> > > > > > > > > > > > or
>>> > > > > > > > > > > > > > > fields
>>> > > > > > > > > > > > > > > > > of a class - in that case compression
>>> will be
>>> > > > used
>>> > > > > at
>>> > > > > > > > > > > marshalling
>>> > > > > > > > > > > > > in
>>> > > > > > > > > > > > > > > > > relation to annotated fields.
>>> > > > > > > > > > > > > > > > >
>>> > > > > > > > > > > > > > > > > 2017-06-06 15:10 GMT+03:00 Антон Чураев <
>>> > > > > > > > > > churaev.an@gmail.com
>>> > > > > > > > > > > >:
>>> > > > > > > > > > > > > > > > >
>>> > > > > > > > > > > > > > > > > > Vyacheslav,
>>> > > > > > > > > > > > > > > > > >
>>> > > > > > > > > > > > > > > > > > Is it possible to propose
>>> implementation
>>> > that
>>> > > > can
>>> > > > > > be
>>> > > > > > > > > > switched
>>> > > > > > > > > > > > on
>>> > > > > > > > > > > > > > > > > on-demand?
>>> > > > > > > > > > > > > > > > > > In this case it should not affect
>>> > performance
>>> > > > of
>>> > > > > > > > current
>>> > > > > > > > > > > > > solution.
>>> > > > > > > > > > > > > > > > > >
>>> > > > > > > > > > > > > > > > > > I mean, that users should make decision
>>> > what
>>> > > is
>>> > > > > > more
>>> > > > > > > > > > > important
>>> > > > > > > > > > > > > for
>>> > > > > > > > > > > > > > > > them:
>>> > > > > > > > > > > > > > > > > > throutput or memory/net usage.
>>> > > > > > > > > > > > > > > > > > May be they will be choose not all
>>> objects,
>>> > > or
>>> > > > > only
>>> > > > > > > > some
>>> > > > > > > > > > > > > attributes
>>> > > > > > > > > > > > > > > of
>>> > > > > > > > > > > > > > > > > > objects for compress.
>>> > > > > > > > > > > > > > > > > >
>>> > > > > > > > > > > > > > > > > > 2017-06-06 14:48 GMT+03:00 Vyacheslav
>>> > > Daradur <
>>> > > > > > > > > > > > > daradurvs@gmail.com
>>> > > > > > > > > > > > > > >:
>>> > > > > > > > > > > > > > > > > >
>>> > > > > > > > > > > > > > > > > > > Conclusion:
>>> > > > > > > > > > > > > > > > > > > Provided solution allows reduce size
>>> of
>>> > an
>>> > > > > object
>>> > > > > > > in
>>> > > > > > > > > > > > > IgniteCache
>>> > > > > > > > > > > > > > at
>>> > > > > > > > > > > > > > > > the
>>> > > > > > > > > > > > > > > > > > > cost of throughput reduction (small
>>> - in
>>> > > some
>>> > > > > > > cases),
>>> > > > > > > > > it
>>> > > > > > > > > > > > > depends
>>> > > > > > > > > > > > > > on
>>> > > > > > > > > > > > > > > > > part
>>> > > > > > > > > > > > > > > > > > of
>>> > > > > > > > > > > > > > > > > > > object which will be compressed and
>>> > > > compression
>>> > > > > > > > > > algorithm.
>>> > > > > > > > > > > > > > > > > > > I mean, we can make more effective
>>> use of
>>> > > > > memory,
>>> > > > > > > and
>>> > > > > > > > > in
>>> > > > > > > > > > > some
>>> > > > > > > > > > > > > > cases
>>> > > > > > > > > > > > > > > > it
>>> > > > > > > > > > > > > > > > > > can
>>> > > > > > > > > > > > > > > > > > > reduce loading of the interconnect.
>>> > > > > (replication,
>>> > > > > > > > > > > > rebalancing)
>>> > > > > > > > > > > > > > > > > > >
>>> > > > > > > > > > > > > > > > > > > Especially, it will be particularly
>>> > useful
>>> > > > for
>>> > > > > > > > object's
>>> > > > > > > > > > > > fields
>>> > > > > > > > > > > > > > > which
>>> > > > > > > > > > > > > > > > > are
>>> > > > > > > > > > > > > > > > > > > large text (>~ 250 bytes) and can be
>>> > > > > effectively
>>> > > > > > > > > > > compressed.
>>> > > > > > > > > > > > > > > > > > >
>>> > > > > > > > > > > > > > > > > > > 2017-06-06 12:00 GMT+03:00 Антон
>>> Чураев <
>>> > > > > > > > > > > > churaev.an@gmail.com
>>> > > > > > > > > > > > > >:
>>> > > > > > > > > > > > > > > > > > >
>>> > > > > > > > > > > > > > > > > > > > Vyacheslav, thank you! But could
>>> you
>>> > > please
>>> > > > > > > > provide a
>>> > > > > > > > > > > > > > conclusions
>>> > > > > > > > > > > > > > > > or
>>> > > > > > > > > > > > > > > > > > > > proposals based on this benchmarks?
>>> > > > > > > > > > > > > > > > > > > >
>>> > > > > > > > > > > > > > > > > > > > 2017-06-06 11:28 GMT+03:00
>>> Vyacheslav
>>> > > > > Daradur <
>>> > > > > > > > > > > > > > > daradurvs@gmail.com
>>> > > > > > > > > > > > > > > > >:
>>> > > > > > > > > > > > > > > > > > > >
>>> > > > > > > > > > > > > > > > > > > > > Dmitry,
>>> > > > > > > > > > > > > > > > > > > > >
>>> > > > > > > > > > > > > > > > > > > > > Excel-pages:
>>> > > > > > > > > > > > > > > > > > > > >
>>> > > > > > > > > > > > > > > > > > > > > 1). "Compression ratio (2)" -
>>> shows
>>> > > > object
>>> > > > > > > size,
>>> > > > > > > > > with
>>> > > > > > > > > > > > > > > compression
>>> > > > > > > > > > > > > > > > > and
>>> > > > > > > > > > > > > > > > > > > > > without compression. (Conditions:
>>> > > literal
>>> > > > > > text)
>>> > > > > > > > > > > > > > > > > > > > > 1st graph shows compression
>>> ratios of
>>> > > > using
>>> > > > > > > > > different
>>> > > > > > > > > > > > > > > compression
>>> > > > > > > > > > > > > > > > > > > > algrithms
>>> > > > > > > > > > > > > > > > > > > > > depending on size of compressed
>>> > field.
>>> > > > > > > > > > > > > > > > > > > > > 2nd graph shows evaluation of
>>> size of
>>> > > > > objects
>>> > > > > > > > > > depending
>>> > > > > > > > > > > > on
>>> > > > > > > > > > > > > > > sizes
>>> > > > > > > > > > > > > > > > > and
>>> > > > > > > > > > > > > > > > > > > > > compression algorithms.
>>> > > > > > > > > > > > > > > > > > > > >
>>> > > > > > > > > > > > > > > > > > > > > 2). "Compression ratio (1)" -
>>> shows
>>> > > > object
>>> > > > > > > size,
>>> > > > > > > > > with
>>> > > > > > > > > > > > > > > compression
>>> > > > > > > > > > > > > > > > > and
>>> > > > > > > > > > > > > > > > > > > > > without compression. (Conditions:
>>> > > badly
>>> > > > > > > > compressed
>>> > > > > > > > > > > > > character
>>> > > > > > > > > > > > > > > > > > sequence)
>>> > > > > > > > > > > > > > > > > > > > > 1st graph shows compression
>>> ratios of
>>> > > > using
>>> > > > > > > > > different
>>> > > > > > > > > > > > > > > compression
>>> > > > > > > > > > > > > > > > > > > > > algrithms depending on size of
>>> > > compressed
>>> > > > > > > field.
>>> > > > > > > > > > > > > > > > > > > > > 2nd graph shows evaluation of
>>> size of
>>> > > > > objects
>>> > > > > > > > > > depending
>>> > > > > > > > > > > > on
>>> > > > > > > > > > > > > > > sizes
>>> > > > > > > > > > > > > > > > > and
>>> > > > > > > > > > > > > > > > > > > > > compression algorithms.
>>> > > > > > > > > > > > > > > > > > > > >
>>> > > > > > > > > > > > > > > > > > > > > 3) 'put-avg" - shows average
>>> time of
>>> > > the
>>> > > > > > "put"
>>> > > > > > > > > > > operation
>>> > > > > > > > > > > > > > > > depending
>>> > > > > > > > > > > > > > > > > on
>>> > > > > > > > > > > > > > > > > > > > size
>>> > > > > > > > > > > > > > > > > > > > > and compression algorithms.
>>> > > > > > > > > > > > > > > > > > > > >
>>> > > > > > > > > > > > > > > > > > > > > 4) 'put-thrpt" - shows
>>> throughput of
>>> > > the
>>> > > > > > "put"
>>> > > > > > > > > > > operation
>>> > > > > > > > > > > > > > > > depending
>>> > > > > > > > > > > > > > > > > on
>>> > > > > > > > > > > > > > > > > > > > size
>>> > > > > > > > > > > > > > > > > > > > > and compression algorithms.
>>> > > > > > > > > > > > > > > > > > > > >
>>> > > > > > > > > > > > > > > > > > > > > 5) 'get-avg" - shows average
>>> time of
>>> > > the
>>> > > > > > "get"
>>> > > > > > > > > > > operation
>>> > > > > > > > > > > > > > > > depending
>>> > > > > > > > > > > > > > > > > on
>>> > > > > > > > > > > > > > > > > > > > size
>>> > > > > > > > > > > > > > > > > > > > > and compression algorithms.
>>> > > > > > > > > > > > > > > > > > > > >
>>> > > > > > > > > > > > > > > > > > > > > 6) 'get-thrpt" - shows
>>> throughput of
>>> > > the
>>> > > > > > "get"
>>> > > > > > > > > > > operation
>>> > > > > > > > > > > > > > > > depending
>>> > > > > > > > > > > > > > > > > on
>>> > > > > > > > > > > > > > > > > > > > size
>>> > > > > > > > > > > > > > > > > > > > > and compression algorithms.
>>> > > > > > > > > > > > > > > > > > > > >
>>> > > > > > > > > > > > > > > > > > > > >
>>> > > > > > > > > > > > > > > > > > > > >
>>> > > > > > > > > > > > > > > > > > > > >
>>> > > > > > > > > > > > > > > > > > > > > 2017-06-06 10:59 GMT+03:00
>>> Dmitriy
>>> > > > > Setrakyan
>>> > > > > > <
>>> > > > > > > > > > > > > > > > > dsetrakyan@apache.org
>>> > > > > > > > > > > > > > > > > > >:
>>> > > > > > > > > > > > > > > > > > > > >
>>> > > > > > > > > > > > > > > > > > > > > > Vladimir, I am not sure how to
>>> > > > interpret
>>> > > > > > the
>>> > > > > > > > > > graphs?
>>> > > > > > > > > > > > What
>>> > > > > > > > > > > > > > are
>>> > > > > > > > > > > > > > > > we
>>> > > > > > > > > > > > > > > > > > > > looking
>>> > > > > > > > > > > > > > > > > > > > > > at?
>>> > > > > > > > > > > > > > > > > > > > > >
>>> > > > > > > > > > > > > > > > > > > > > > On Tue, Jun 6, 2017 at 12:33
>>> AM,
>>> > > > > Vyacheslav
>>> > > > > > > > > > Daradur <
>>> > > > > > > > > > > > > > > > > > > > daradurvs@gmail.com
>>> > > > > > > > > > > > > > > > > > > > > >
>>> > > > > > > > > > > > > > > > > > > > > > wrote:
>>> > > > > > > > > > > > > > > > > > > > > >
>>> > > > > > > > > > > > > > > > > > > > > > > Hi, Igniters.
>>> > > > > > > > > > > > > > > > > > > > > > >
>>> > > > > > > > > > > > > > > > > > > > > > > I've prepared some
>>> benchmarking.
>>> > > > > Results
>>> > > > > > > [1].
>>> > > > > > > > > > > > > > > > > > > > > > >
>>> > > > > > > > > > > > > > > > > > > > > > > And I've prepared the
>>> evaluation
>>> > in
>>> > > > the
>>> > > > > > > form
>>> > > > > > > > of
>>> > > > > > > > > > > > > diagrams
>>> > > > > > > > > > > > > > > [2].
>>> > > > > > > > > > > > > > > > > > > > > > >
>>> > > > > > > > > > > > > > > > > > > > > > > I hope that helps to
>>> interest the
>>> > > > > > community
>>> > > > > > > > and
>>> > > > > > > > > > > > > > > accelerates a
>>> > > > > > > > > > > > > > > > > > > > reaction
>>> > > > > > > > > > > > > > > > > > > > > to
>>> > > > > > > > > > > > > > > > > > > > > > > this improvment :)
>>> > > > > > > > > > > > > > > > > > > > > > >
>>> > > > > > > > > > > > > > > > > > > > > > > [1]
>>> > > > > > > > > > > > > > > > > > > > > > >
>>> https://github.com/daradurvs/
>>> > > > > > > > > > > > ignite-compression/tree/
>>> > > > > > > > > > > > > > > > > > > > > > >
>>> master/src/main/resources/result
>>> > > > > > > > > > > > > > > > > > > > > > > [2]
>>> > https://drive.google.com/file/
>>> > > d/
>>> > > > > > > > > > > > > > > > > > 0B2CeUAOgrHkoMklyZ25YTEdKcEk/
>>> > > > > > > > > > > > > > > > > > > > view
>>> > > > > > > > > > > > > > > > > > > > > > >
>>> > > > > > > > > > > > > > > > > > > > > > >
>>> > > > > > > > > > > > > > > > > > > > > > >
>>> > > > > > > > > > > > > > > > > > > > > > > 2017-05-24 9:49 GMT+03:00
>>> > > Vyacheslav
>>> > > > > > > Daradur
>>> > > > > > > > <
>>> > > > > > > > > > > > > > > > > > daradurvs@gmail.com
>>> > > > > > > > > > > > > > > > > > > >:
>>> > > > > > > > > > > > > > > > > > > > > > >
>>> > > > > > > > > > > > > > > > > > > > > > > > Guys, any thoughts?
>>> > > > > > > > > > > > > > > > > > > > > > > >
>>> > > > > > > > > > > > > > > > > > > > > > > > 2017-05-16 13:40 GMT+03:00
>>> > > > Vyacheslav
>>> > > > > > > > > Daradur <
>>> > > > > > > > > > > > > > > > > > > daradurvs@gmail.com
>>> > > > > > > > > > > > > > > > > > > > >:
>>> > > > > > > > > > > > > > > > > > > > > > > >
>>> > > > > > > > > > > > > > > > > > > > > > > >> Hi guys,
>>> > > > > > > > > > > > > > > > > > > > > > > >>
>>> > > > > > > > > > > > > > > > > > > > > > > >> I've prepared the PR to
>>> show
>>> > my
>>> > > > > idea.
>>> > > > > > > > > > > > > > > > > > > > > > > >>
>>> https://github.com/apache/
>>> > > > > > > > > > > ignite/pull/1951/files
>>> > > > > > > > > > > > > > > > > > > > > > > >>
>>> > > > > > > > > > > > > > > > > > > > > > > >> About querying - I've just
>>> > > copied
>>> > > > > > > existing
>>> > > > > > > > > > tests
>>> > > > > > > > > > > > and
>>> > > > > > > > > > > > > > > have
>>> > > > > > > > > > > > > > > > > > > > annotated
>>> > > > > > > > > > > > > > > > > > > > > > the
>>> > > > > > > > > > > > > > > > > > > > > > > >> testing data.
>>> > > > > > > > > > > > > > > > > > > > > > > >>
>>> https://github.com/apache/
>>> > > > > > > > > > > > > > ignite/pull/1951/files#diff-
>>> > > > > > > > > > > > > > > > > c19a9d
>>> > > > > > > > > > > > > > > > > > > > > > > >> f4058141d059bb577e75244764
>>> > > > > > > > > > > > > > > > > > > > > > > >>
>>> > > > > > > > > > > > > > > > > > > > > > > >> It means fields which
>>> will be
>>> > > > marked
>>> > > > > > by
>>> > > > > > > > > > > > > > > @BinaryCompression
>>> > > > > > > > > > > > > > > > > > will
>>> > > > > > > > > > > > > > > > > > > be
>>> > > > > > > > > > > > > > > > > > > > > > > >> compressed at marshalling
>>> via
>>> > > > > > > > > > BinaryMarshaller.
>>> > > > > > > > > > > > > > > > > > > > > > > >>
>>> > > > > > > > > > > > > > > > > > > > > > > >> This solution has no
>>> effect on
>>> > > > > > existing
>>> > > > > > > > data
>>> > > > > > > > > > or
>>> > > > > > > > > > > > > > project
>>> > > > > > > > > > > > > > > > > > > > > architecture.
>>> > > > > > > > > > > > > > > > > > > > > > > >>
>>> > > > > > > > > > > > > > > > > > > > > > > >> I'll be glad to see your
>>> > > thougths.
>>> > > > > > > > > > > > > > > > > > > > > > > >>
>>> > > > > > > > > > > > > > > > > > > > > > > >>
>>> > > > > > > > > > > > > > > > > > > > > > > >> 2017-05-15 19:18 GMT+03:00
>>> > > > > Vyacheslav
>>> > > > > > > > > Daradur
>>> > > > > > > > > > <
>>> > > > > > > > > > > > > > > > > > > > daradurvs@gmail.com
>>> > > > > > > > > > > > > > > > > > > > > >:
>>> > > > > > > > > > > > > > > > > > > > > > > >>
>>> > > > > > > > > > > > > > > > > > > > > > > >>> Dmitriy,
>>> > > > > > > > > > > > > > > > > > > > > > > >>>
>>> > > > > > > > > > > > > > > > > > > > > > > >>> I have ready prototype. I
>>> > want
>>> > > to
>>> > > > > > show
>>> > > > > > > > it.
>>> > > > > > > > > > > > > > > > > > > > > > > >>> It is always easier to
>>> > discuss
>>> > > on
>>> > > > > > > > example.
>>> > > > > > > > > > > > > > > > > > > > > > > >>>
>>> > > > > > > > > > > > > > > > > > > > > > > >>> 2017-05-15 19:02
>>> GMT+03:00
>>> > > > Dmitriy
>>> > > > > > > > > Setrakyan
>>> > > > > > > > > > <
>>> > > > > > > > > > > > > > > > > > > > > dsetrakyan@apache.org
>>> > > > > > > > > > > > > > > > > > > > > > >:
>>> > > > > > > > > > > > > > > > > > > > > > > >>>
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> Vyacheslav,
>>> > > > > > > > > > > > > > > > > > > > > > > >>>>
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> I think it is a bit
>>> > premature
>>> > > to
>>> > > > > > > > provide a
>>> > > > > > > > > > PR
>>> > > > > > > > > > > > > > without
>>> > > > > > > > > > > > > > > > > > getting
>>> > > > > > > > > > > > > > > > > > > a
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> community
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> consensus on the dev
>>> list.
>>> > > > Please
>>> > > > > > > allow
>>> > > > > > > > > some
>>> > > > > > > > > > > > time
>>> > > > > > > > > > > > > > for
>>> > > > > > > > > > > > > > > > the
>>> > > > > > > > > > > > > > > > > > > > > community
>>> > > > > > > > > > > > > > > > > > > > > > to
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> respond.
>>> > > > > > > > > > > > > > > > > > > > > > > >>>>
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> D.
>>> > > > > > > > > > > > > > > > > > > > > > > >>>>
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> On Mon, May 15, 2017 at
>>> 6:36
>>> > > AM,
>>> > > > > > > > > Vyacheslav
>>> > > > > > > > > > > > > Daradur
>>> > > > > > > > > > > > > > <
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> daradurvs@gmail.com>
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> wrote:
>>> > > > > > > > > > > > > > > > > > > > > > > >>>>
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > I created the ticket:
>>> > > > > > > > > > > > > > > https://issues.apache.org/jira
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> /browse/IGNITE-5226
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> >
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > I'll prepare a PR with
>>> > > > described
>>> > > > > > > > > solution
>>> > > > > > > > > > in
>>> > > > > > > > > > > > > > couple
>>> > > > > > > > > > > > > > > of
>>> > > > > > > > > > > > > > > > > > days.
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> >
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > 2017-05-15 15:05
>>> GMT+03:00
>>> > > > > > > Vyacheslav
>>> > > > > > > > > > > Daradur
>>> > > > > > > > > > > > <
>>> > > > > > > > > > > > > > > > > > > > > > daradurvs@gmail.com
>>> > > > > > > > > > > > > > > > > > > > > > > >:
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> >
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > > Hi, Igniters!
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > > Apache 2.0 is
>>> released.
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > > Let's continue the
>>> > > > discussion
>>> > > > > > > about
>>> > > > > > > > a
>>> > > > > > > > > > > > > > compression
>>> > > > > > > > > > > > > > > > > > design.
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > > At the moment, I
>>> found
>>> > > only
>>> > > > > one
>>> > > > > > > > > solution
>>> > > > > > > > > > > > which
>>> > > > > > > > > > > > > > is
>>> > > > > > > > > > > > > > > > > > > compatible
>>> > > > > > > > > > > > > > > > > > > > > > with
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > querying
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > > and indexing, this
>>> is
>>> > > > > > > > > per-objects-field
>>> > > > > > > > > > > > > > > compression.
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > > Per-fields
>>> compression
>>> > > means
>>> > > > > > that
>>> > > > > > > > > > metadata
>>> > > > > > > > > > > > (a
>>> > > > > > > > > > > > > > > > header)
>>> > > > > > > > > > > > > > > > > of
>>> > > > > > > > > > > > > > > > > > > an
>>> > > > > > > > > > > > > > > > > > > > > > object
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> won't
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > > be compressed, only
>>> > > > serialized
>>> > > > > > > > values
>>> > > > > > > > > of
>>> > > > > > > > > > > an
>>> > > > > > > > > > > > > > object
>>> > > > > > > > > > > > > > > > > > fields
>>> > > > > > > > > > > > > > > > > > > > (in
>>> > > > > > > > > > > > > > > > > > > > > > > bytes
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> array
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > > form) will be
>>> > compressed.
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > > This solution have
>>> some
>>> > > > > > > contentious
>>> > > > > > > > > > > issues:
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > > - small values, like
>>> > > > > primitives
>>> > > > > > > and
>>> > > > > > > > > > short
>>> > > > > > > > > > > > > > arrays -
>>> > > > > > > > > > > > > > > > > there
>>> > > > > > > > > > > > > > > > > > > > isn't
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> sense to
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > > compress them;
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > > - there is no
>>> possible
>>> > to
>>> > > > use
>>> > > > > > > > > > compression
>>> > > > > > > > > > > > with
>>> > > > > > > > > > > > > > > > > > > > java-predefined
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> types;
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > > We can provide an
>>> > > > annotation,
>>> > > > > > > > > > > > > > @IgniteCompression -
>>> > > > > > > > > > > > > > > > for
>>> > > > > > > > > > > > > > > > > > > > > example,
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> which can
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > > be used by users for
>>> > > marking
>>> > > > > > > fields
>>> > > > > > > > to
>>> > > > > > > > > > > > > compress.
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > > Any thoughts?
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > > Maybe someone
>>> already
>>> > have
>>> > > > > ready
>>> > > > > > > > > design?
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > > 2017-04-10 11:06
>>> > GMT+03:00
>>> > > > > > > > Vyacheslav
>>> > > > > > > > > > > > Daradur
>>> > > > > > > > > > > > > <
>>> > > > > > > > > > > > > > > > > > > > > > > daradurvs@gmail.com
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> >:
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >> Alexey,
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >> Yes, I've read it.
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >> Ok, let's discuss
>>> about
>>> > > > > public
>>> > > > > > > API
>>> > > > > > > > > > > design.
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >> I think we need to
>>> add
>>> > > > some a
>>> > > > > > > > > configure
>>> > > > > > > > > > > > > entity
>>> > > > > > > > > > > > > > to
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> CacheConfiguration,
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >> which will contain
>>> the
>>> > > > > > Compressor
>>> > > > > > > > > > > interface
>>> > > > > > > > > > > > > > > > > > > implementation
>>> > > > > > > > > > > > > > > > > > > > > and
>>> > > > > > > > > > > > > > > > > > > > > > > some
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > usefull
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >> parameters.
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >> Or maybe to
>>> provide a
>>> > > > > > > > > BinaryMarshaller
>>> > > > > > > > > > > > > > decorator,
>>> > > > > > > > > > > > > > > > > which
>>> > > > > > > > > > > > > > > > > > > > will
>>> > > > > > > > > > > > > > > > > > > > > be
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> compress
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >> data after
>>> marshalling.
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >> 2017-04-10 10:40
>>> > > GMT+03:00
>>> > > > > > Alexey
>>> > > > > > > > > > > > Kuznetsov <
>>> > > > > > > > > > > > > > > > > > > > > > > akuznetsov@apache.org
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> >:
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> Vyacheslav,
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>>
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> Did you read
>>> initial
>>> > > > > > discussion
>>> > > > > > > > [1]
>>> > > > > > > > > > > about
>>> > > > > > > > > > > > > > > > > compression?
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> As far as I
>>> remember
>>> > we
>>> > > > > agreed
>>> > > > > > > to
>>> > > > > > > > > add
>>> > > > > > > > > > > only
>>> > > > > > > > > > > > > > some
>>> > > > > > > > > > > > > > > > > > > > "top-level"
>>> > > > > > > > > > > > > > > > > > > > > > API
>>> > > > > > > > > > > > > > > > > > > > > > > in
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > order
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> to
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> provide a way for
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> Ignite users to
>>> inject
>>> > > > some
>>> > > > > > sort
>>> > > > > > > > of
>>> > > > > > > > > > > custom
>>> > > > > > > > > > > > > > > > > > compression.
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>>
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>>
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> [1]
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>>
>>> > > > > > http://apache-ignite-developer
>>> > > > > > > > > > > > > > > s.2346864.n4.nabble
>>> > > > > > > > > > > > > > > > .
>>> > > > > > > > > > > > > > > > > > > > > com/Data-c
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>>
>>> > > ompression-in-Ignite-2-0-
>>> > > > > > > > > td10099.html
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>>
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> On Mon, Apr 10,
>>> 2017
>>> > at
>>> > > > 2:19
>>> > > > > > PM,
>>> > > > > > > > > > > > daradurvs <
>>> > > > > > > > > > > > > > > > > > > > > > daradurvs@gmail.com
>>> > > > > > > > > > > > > > > > > > > > > > > >
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > wrote:
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>>
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> > Hi Igniters!
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> >
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> > I am interested
>>> in
>>> > > this
>>> > > > > > task.
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> > Provide some
>>> kind of
>>> > > > > > pluggable
>>> > > > > > > > > > > > compression
>>> > > > > > > > > > > > > > SPI
>>> > > > > > > > > > > > > > > > > > support
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> > <
>>> > > > > https://issues.apache.org/
>>> > > > > > > > > > > > > > > > > jira/browse/IGNITE-3592>
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> >
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> > I developed a
>>> > solution
>>> > > > on
>>> > > > > > > > > > > > > > > > BinaryMarshaller-level,
>>> > > > > > > > > > > > > > > > > > but
>>> > > > > > > > > > > > > > > > > > > > > > reviewer
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> has
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> rejected
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> > it.
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> >
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> > Let's continue
>>> > > > discussion
>>> > > > > of
>>> > > > > > > > task
>>> > > > > > > > > > > goals
>>> > > > > > > > > > > > > and
>>> > > > > > > > > > > > > > > > > solution
>>> > > > > > > > > > > > > > > > > > > > > design.
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> > As I understood
>>> > that,
>>> > > > the
>>> > > > > > main
>>> > > > > > > > > goal
>>> > > > > > > > > > of
>>> > > > > > > > > > > > > this
>>> > > > > > > > > > > > > > > task
>>> > > > > > > > > > > > > > > > > is
>>> > > > > > > > > > > > > > > > > > to
>>> > > > > > > > > > > > > > > > > > > > > store
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> data in
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> > compressed form.
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> > This is what I
>>> need
>>> > > from
>>> > > > > > > Ignite
>>> > > > > > > > as
>>> > > > > > > > > > its
>>> > > > > > > > > > > > > user.
>>> > > > > > > > > > > > > > > > > > > Compression
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> provides
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> economy
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> > on
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> > servers.
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> > We can store
>>> more
>>> > data
>>> > > > on
>>> > > > > > same
>>> > > > > > > > > > servers
>>> > > > > > > > > > > > at
>>> > > > > > > > > > > > > > the
>>> > > > > > > > > > > > > > > > cost
>>> > > > > > > > > > > > > > > > > > of
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> increasing CPU
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> > utilization.
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> >
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> > I'm researching
>>> a
>>> > > > > > possibility
>>> > > > > > > of
>>> > > > > > > > > > > > > > > implementation
>>> > > > > > > > > > > > > > > > of
>>> > > > > > > > > > > > > > > > > > > > > > compression
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> at the
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> > cache-level.
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> >
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> > Any thoughts?
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> >
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> > --
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> > Best regards,
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> > Vyacheslav
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> >
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> >
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> >
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> >
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> > --
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> > View this
>>> message in
>>> > > > > > context:
>>> > > > > > > > > > > > > > > > > http://apache-ignite-
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> >
>>> > > > > > developers.2346864.n4.nabble.
>>> > > > > > > > > > > > > > > > > > com/Data-compression-in-
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> >
>>> > > > > > Ignite-2-0-tp10099p16317.html
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> > Sent from the
>>> Apache
>>> > > > > Ignite
>>> > > > > > > > > > Developers
>>> > > > > > > > > > > > > > mailing
>>> > > > > > > > > > > > > > > > > list
>>> > > > > > > > > > > > > > > > > > > > > archive
>>> > > > > > > > > > > > > > > > > > > > > > at
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> Nabble.com.
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> >
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>>
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>>
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>>
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> --
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> Alexey Kuznetsov
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>>
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >> --
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >> Best Regards,
>>> > Vyacheslav
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >
>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > > --
>>> > > > > > > > > > > > > > > > > > > > > > >
>>>
>> ...
>
> [Message clipped]

Re: Data compression in Ignite 2.0

Posted by ds...@apache.org.
I would prefer that we reuse an existing compression protocol, but at the table level.

If not possible, then we should go with a shared mapping approach. Any idea how hard?

⁣D.​

On Aug 1, 2017, 11:15 AM, at 11:15 AM, Vladimir Ozerov <vo...@gridgain.com> wrote:
>Vyacheslav,
>
>This is not about my needs, but about the product :-) BinaryObject is a
>central entity used for both data transfer and data storage. This is
>both
>good and bad at the same time.
>
>Good thing is that as we optimize binary protocol, we improve both
>network
>and storage performance at the same time. We have at least 3 things
>which
>will be included into the product soon: varint encoding [1], optimized
>string encoding [2] and null-field optimization [3]. Bad thing is that
>binary object format is not well suited for data storage optimizations,
>including compression. For example, one good compression technique is
>to
>organize data in column-store format, or to introduce shared
>"dictionary"
>with unique values on cache level. In both cases N equal values are not
>stored N times. Instead, we store one value and N references to it, or
>so.
>This way 2x-10x compression is possible depending on workload type.
>Binary
>object protocol with some compression on top of it cannot give such
>improvement, because it will compress data in individual objects,
>instead
>of compressing the whole cache data in a single context.
>
>That said, I propose to give up adding compression to BinaryObject.
>This is
>a dead end. Instead, we should:
>1) Optimize protocol itself to be more compact, as described in
>aforementioned Ignite tickets
>2) Start new discussion about storage compression
>
>You can read papers of other vendors to get better understanding on
>possible compression options. E.g. Oracle has a lot of compression
>techniques, including heat maps, background compression, per-block
>compression, data dictionaries, etc. [4].
>
>[1] https://issues.apache.org/jira/browse/IGNITE-5097
>[2] https://issues.apache.org/jira/browse/IGNITE-5655
>[3] https://issues.apache.org/jira/browse/IGNITE-3939
>[4]
>http://www.oracle.com/technetwork/database/options/compression/advanced-
>compression-wp-12c-1896128.pdf
>
>Vladimir.
>
>
>On Tue, Jul 11, 2017 at 6:56 PM, Vyacheslav Daradur
><da...@gmail.com>
>wrote:
>
>> Hi Igniters!
>>
>> I'd like to continue developing and discussing about compression in
>Ignite.
>>
>> Vladimir, could you propose a design of compression feature in
>Ignite,
>> that suits you?
>>
>> 2017-06-15 16:13 GMT+03:00 Vyacheslav Daradur <da...@gmail.com>:
>>
>>> Hi Igniters.
>>>
>>> Vladimir, I want to propose another design of an implementation of
>the
>>> per-field compression.
>>>
>>> 1) We will add new step in the method prepareForCache (for example)
>of
>>> CacheObject, or in GridCacheMapEntry.
>>>
>>> At the step, after marshalling of an object, we will compress fields
>of
>>> the object which described in advance.
>>> User will describe class fields which he wants to compess in an
>another
>>> entity like Metadata.
>>>
>>> For compression, we will introduce another entity, for example
>>> CompressionProcessor, which will work with bytes array (marshalled
>object).
>>> The entity will read bytes array of described fields, compress it
>and
>>> rewrite binary representation of the whole object.
>>> After processing the object will be put in the cache.
>>>
>>> In this case design not to relate to binary infrastructure.
>>> But there is big overhead to heap-memory for the buffer.
>>>
>>> 2) Another solution is to compress bytes array of whole object on
>copying
>>> to off-heap.
>>> But, in this case I don't understand yet, how to provide support of
>>> querying and indexing.
>>>
>>>
>>> 2017-06-09 11:21 GMT+03:00 Sergey Kozlov <sk...@gridgain.com>:
>>>
>>>> Hi
>>>>
>>>> * "Per-field compression" is applicable for huge BLOB fields and
>will
>>>> impose the restrictions like unable ot index such fields, slower
>getting
>>>> data, potential OOM issues if compression ration is too high.
>>>> But for some cases it makes sense
>>>>
>>>> On Fri, Jun 9, 2017 at 11:11 AM, Антон Чураев
><ch...@gmail.com>
>>>> wrote:
>>>>
>>>> > Seems that Dmitry is referring to transparent data encryption. It
>is
>>>> used
>>>> > throughout the whale database industry.
>>>> >
>>>> > 2017-06-09 10:50 GMT+03:00 Vladimir Ozerov
><vo...@gridgain.com>:
>>>> >
>>>> > > Dima,
>>>> > >
>>>> > > Encryption of certain fields is as bad as compression. First,
>it is a
>>>> > huge
>>>> > > change, which makes already complex binary protocol even more
>>>> complex.
>>>> > > Second, it have to be ported to CPP, .NET platforms, as well as
>to
>>>> JDBC
>>>> > and
>>>> > > ODBC.
>>>> > > Last, but the most important - this is not our headache to
>encrypt
>>>> > > sensitive data. This is user responsibility. Nobody in a sane
>mind
>>>> will
>>>> > > store passwords in plain form. Instead, user should encrypt it
>on his
>>>> > own,
>>>> > > choosing proper encryption parameters - algorithms, key
>lengths,
>>>> salts,
>>>> > > etc.. How are you going to expose this in API or configuration?
>>>> > >
>>>> > > We should not implement data encryption on binary level, this
>is out
>>>> of
>>>> > > question. Encryption should be implemented on application level
>(user
>>>> > > efforts), transport layer (SSL - we already have it), and
>possibly on
>>>> > > disk-level (there are tools for this already).
>>>> > >
>>>> > >
>>>> > > On Fri, Jun 9, 2017 at 9:06 AM, Vyacheslav Daradur <
>>>> daradurvs@gmail.com>
>>>> > > wrote:
>>>> > >
>>>> > > > >> which is much less useful.
>>>> > > > I note, in some cases there is profit more than twice per
>size of
>>>> an
>>>> > > > object.
>>>> > > >
>>>> > > > >> Would it be possible to change your implementation to
>handle the
>>>> > > > encryption instead?
>>>> > > > Yes, of cource, there's not much difference between
>compression and
>>>> > > > encryption, including in my implementation of
>>>> per-field-compression.
>>>> > > >
>>>> > > > 2017-06-09 8:55 GMT+03:00 Dmitriy Setrakyan
><dsetrakyan@apache.org
>>>> >:
>>>> > > >
>>>> > > > > Vyacheslav,
>>>> > > > >
>>>> > > > > When this feature started out as data compression in
>Ignite, it
>>>> > sounded
>>>> > > > > very useful. Now it is unfolding as a per-field
>compression,
>>>> which is
>>>> > > > much
>>>> > > > > less useful. In fact, it is questionable whether it is
>useful at
>>>> all.
>>>> > > The
>>>> > > > > fact that this feature is implemented does not make it
>mandatory
>>>> for
>>>> > > the
>>>> > > > > community to accept it.
>>>> > > > >
>>>> > > > > However, as I mentioned before, per-field encryption is
>very
>>>> useful,
>>>> > as
>>>> > > > it
>>>> > > > > would allow users automatically encrypt certain sensitive
>fields,
>>>> > like
>>>> > > > > passwords, credit card numbers, etc. There is not much
>conceptual
>>>> > > > > difference between compressing a field vs encrypting a
>field.
>>>> Would
>>>> > it
>>>> > > be
>>>> > > > > possible to change your implementation to handle the
>encryption
>>>> > > instead?
>>>> > > > >
>>>> > > > > D.
>>>> > > > >
>>>> > > > > On Thu, Jun 8, 2017 at 10:42 PM, Vyacheslav Daradur <
>>>> > > daradurvs@gmail.com
>>>> > > > >
>>>> > > > > wrote:
>>>> > > > >
>>>> > > > > > Guys, I want to be clear:
>>>> > > > > > * "Per-field compression" design is the result of a
>research
>>>> of the
>>>> > > > > binary
>>>> > > > > > infrastructure of Ignite and some other its places
>(querying,
>>>> > > indexing,
>>>> > > > > > etc.)
>>>> > > > > > * Full-compression of object will be more effective, but
>in
>>>> this
>>>> > case
>>>> > > > > there
>>>> > > > > > is no capability with querying and indexing (or there is
>large
>>>> > > overhead
>>>> > > > > by
>>>> > > > > > way of decompressing of full object (or caches pages) on
>>>> demand)
>>>> > > > > > * "Per-field compression" is a one of ways to implement
>the
>>>> > > compression
>>>> > > > > > feature
>>>> > > > > >
>>>> > > > > > I'm new to Ignite also I can be mistaken in some things.
>>>> > > > > > Last 3-4 month I've tryed to start dicussion about a
>design,
>>>> but
>>>> > > nobody
>>>> > > > > > answers nothing (except Dmitry and Valentin who was
>interested
>>>> how
>>>> > it
>>>> > > > > > works).
>>>> > > > > > But I understand that this is community and nobody is
>obliged
>>>> to
>>>> > > > anybody.
>>>> > > > > >
>>>> > > > > > There are strong Ignite experts.
>>>> > > > > > If they can help me and community with a design of the
>>>> compression
>>>> > > > > feature
>>>> > > > > > it will be great.
>>>> > > > > > At the moment I have a desire and time to be engaged in
>>>> development
>>>> > > of
>>>> > > > > > compression feature in Ignite.
>>>> > > > > > Let's use this opportunity :)
>>>> > > > > >
>>>> > > > > > 2017-06-09 5:36 GMT+03:00 Dmitriy Setrakyan <
>>>> dsetrakyan@apache.org
>>>> > >:
>>>> > > > > >
>>>> > > > > > > Igniters,
>>>> > > > > > >
>>>> > > > > > > I have never seen a single Ignite user asking about
>>>> compressing a
>>>> > > > > single
>>>> > > > > > > field. However, we have had requests to secure certain
>>>> fields,
>>>> > e.g.
>>>> > > > > > > passwords.
>>>> > > > > > >
>>>> > > > > > > I personally do not think per-field compression is
>needed,
>>>> unless
>>>> > > we
>>>> > > > > can
>>>> > > > > > > point out some concrete real life use cases.
>>>> > > > > > >
>>>> > > > > > > D.
>>>> > > > > > >
>>>> > > > > > > On Thu, Jun 8, 2017 at 3:42 AM, Vyacheslav Daradur <
>>>> > > > > daradurvs@gmail.com>
>>>> > > > > > > wrote:
>>>> > > > > > >
>>>> > > > > > > > Anton,
>>>> > > > > > > >
>>>> > > > > > > > >> I thought that if there will storing compressed
>data in
>>>> the
>>>> > > > > memory,
>>>> > > > > > > data
>>>> > > > > > > > >> will transmit over wire in compression too. Is it
>right?
>>>> > > > > > > >
>>>> > > > > > > > In per-field compression case - yes.
>>>> > > > > > > >
>>>> > > > > > > > 2017-06-08 13:36 GMT+03:00 Антон Чураев <
>>>> churaev.an@gmail.com
>>>> > >:
>>>> > > > > > > >
>>>> > > > > > > > > Guys, could you please help me.
>>>> > > > > > > > > I thought that if there will storing compressed
>data in
>>>> the
>>>> > > > memory,
>>>> > > > > > > data
>>>> > > > > > > > > will transmit over wire in compression too. Is it
>right?
>>>> > > > > > > > >
>>>> > > > > > > > > 2017-06-08 13:30 GMT+03:00 Vyacheslav Daradur <
>>>> > > > daradurvs@gmail.com
>>>> > > > > >:
>>>> > > > > > > > >
>>>> > > > > > > > > > Vladimir,
>>>> > > > > > > > > >
>>>> > > > > > > > > > The main problem which I'am trying to solve is
>storing
>>>> data
>>>> > > in
>>>> > > > > > memory
>>>> > > > > > > > in
>>>> > > > > > > > > a
>>>> > > > > > > > > > compression form via Ignite.
>>>> > > > > > > > > > The main goal is using memory more effectivelly.
>>>> > > > > > > > > >
>>>> > > > > > > > > > >> here the much simpler step would be to full
>>>> > > > > > > > > > compression on per-cache basis rather than
>dealing with
>>>> > > > > per-fields
>>>> > > > > > > > case.
>>>> > > > > > > > > >
>>>> > > > > > > > > > Please explain your idea. Compess data by
>memory-page?
>>>> > > > > > > > > > Is it compatible with quering and indexing?
>>>> > > > > > > > > >
>>>> > > > > > > > > > >> In the end, if user would like to compress
>>>> particular
>>>> > > field,
>>>> > > > > he
>>>> > > > > > > can
>>>> > > > > > > > > > always to it on his own
>>>> > > > > > > > > > I think we mustn't think in this way, if user
>need
>>>> > something
>>>> > > he
>>>> > > > > > > trying
>>>> > > > > > > > to
>>>> > > > > > > > > > choose a tool which has this feature OOTB.
>>>> > > > > > > > > >
>>>> > > > > > > > > >
>>>> > > > > > > > > >
>>>> > > > > > > > > > 2017-06-08 12:53 GMT+03:00 Vladimir Ozerov <
>>>> > > > vozerov@gridgain.com
>>>> > > > > >:
>>>> > > > > > > > > >
>>>> > > > > > > > > > > Igniters,
>>>> > > > > > > > > > >
>>>> > > > > > > > > > > Honestly I still do not see how to apply it
>>>> gracefully
>>>> > this
>>>> > > > > > feature
>>>> > > > > > > > ti
>>>> > > > > > > > > > > Ignite. And overall approach to compress only
>>>> particular
>>>> > > > fields
>>>> > > > > > > looks
>>>> > > > > > > > > > > overcomplicated to me. Remember, that our main
>use
>>>> case
>>>> > is
>>>> > > an
>>>> > > > > > > > > application
>>>> > > > > > > > > > > without classes on the server. It means that
>any
>>>> kind of
>>>> > > > > > > annotations
>>>> > > > > > > > > are
>>>> > > > > > > > > > > inapplicable. To be more precise: proper API
>should
>>>> be
>>>> > > > > > implemented
>>>> > > > > > > to
>>>> > > > > > > > > > > handle no-class case (e.g. how would build such
>an
>>>> object
>>>> > > > > through
>>>> > > > > > > > > > > BinaryBuilder without a class?), and only then
>add
>>>> > > > annotations
>>>> > > > > as
>>>> > > > > > > > > > > convenient addition to more basic API.
>>>> > > > > > > > > > >
>>>> > > > > > > > > > > It seems to me that full implementation, which
>takes
>>>> in
>>>> > > count
>>>> > > > > > > proper
>>>> > > > > > > > > > > "classless" API, changes to binary metadata to
>>>> reflect
>>>> > > > > compressed
>>>> > > > > > > > > fields,
>>>> > > > > > > > > > > changes to SQL, changes to binary protocol, and
>>>> porting
>>>> > to
>>>> > > > .NET
>>>> > > > > > and
>>>> > > > > > > > > CPP,
>>>> > > > > > > > > > > will yield very complex solution with little
>value
>>>> to the
>>>> > > > > > product.
>>>> > > > > > > > > > >
>>>> > > > > > > > > > > Instead, as I proposed earlier, it seems that
>we'd
>>>> better
>>>> > > > start
>>>> > > > > > > with
>>>> > > > > > > > > the
>>>> > > > > > > > > > > problem we are trying to solve. Basically,
>>>> compression
>>>> > > could
>>>> > > > > help
>>>> > > > > > > in
>>>> > > > > > > > > two
>>>> > > > > > > > > > > cases:
>>>> > > > > > > > > > > 1) Transmitting data over wire - it should be
>>>> implemented
>>>> > > on
>>>> > > > > > > > > > communication
>>>> > > > > > > > > > > layer and should not affect binary
>serialization
>>>> > component
>>>> > > a
>>>> > > > > lot.
>>>> > > > > > > > > > > 2) Storing data in memory - here the much
>simpler
>>>> step
>>>> > > would
>>>> > > > be
>>>> > > > > > to
>>>> > > > > > > > full
>>>> > > > > > > > > > > compression on per-cache basis rather than
>dealing
>>>> with
>>>> > > > > > per-fields
>>>> > > > > > > > > case.
>>>> > > > > > > > > > >
>>>> > > > > > > > > > > In the end, if user would like to compress
>particular
>>>> > > field,
>>>> > > > he
>>>> > > > > > can
>>>> > > > > > > > > > always
>>>> > > > > > > > > > > to it on his own, and set already compressed
>field
>>>> to our
>>>> > > > > > > > BinaryObject.
>>>> > > > > > > > > > >
>>>> > > > > > > > > > > Vladimir.
>>>> > > > > > > > > > >
>>>> > > > > > > > > > >
>>>> > > > > > > > > > > On Thu, Jun 8, 2017 at 12:37 PM, Vyacheslav
>Daradur <
>>>> > > > > > > > > daradurvs@gmail.com
>>>> > > > > > > > > > >
>>>> > > > > > > > > > > wrote:
>>>> > > > > > > > > > >
>>>> > > > > > > > > > > > Valentin,
>>>> > > > > > > > > > > >
>>>> > > > > > > > > > > > Yes, I have the prototype[1][2]
>>>> > > > > > > > > > > >
>>>> > > > > > > > > > > > You can see an example of Java class[3] that
>I
>>>> used in
>>>> > my
>>>> > > > > > > > benchmark.
>>>> > > > > > > > > > > > For example:
>>>> > > > > > > > > > > > class Foo {
>>>> > > > > > > > > > > > @BinaryCompression
>>>> > > > > > > > > > > > String data;
>>>> > > > > > > > > > > > }
>>>> > > > > > > > > > > > If user make decision to store the object in
>>>> compressed
>>>> > > > form,
>>>> > > > > > he
>>>> > > > > > > > can
>>>> > > > > > > > > > use
>>>> > > > > > > > > > > > the annotation @BinaryCompression as shown
>above.
>>>> > > > > > > > > > > > It means annotated field 'data' will be
>compressed
>>>> at
>>>> > > > > > > marshalling.
>>>> > > > > > > > > > > >
>>>> > > > > > > > > > > > [1]
>https://github.com/apache/ignite/pull/1951
>>>> > > > > > > > > > > > [2] https://issues.apache.org/jira
>>>> /browse/IGNITE-5226
>>>> > > > > > > > > > > > [3]
>>>> > > > > > > > > > > > https://github.com/daradurvs/i
>>>> gnite-compression/blob/
>>>> > > > > > > > > > > > master/src/main/java/ru/daradu
>>>> rvs/ignite/compression/
>>>> > > > > > > > > > model/Audit1F.java
>>>> > > > > > > > > > > >
>>>> > > > > > > > > > > >
>>>> > > > > > > > > > > >
>>>> > > > > > > > > > > > 2017-06-08 2:04 GMT+03:00 Valentin Kulichenko
><
>>>> > > > > > > > > > > > valentin.kulichenko@gmail.com
>>>> > > > > > > > > > > > >:
>>>> > > > > > > > > > > >
>>>> > > > > > > > > > > > > Vyacheslav, Anton,
>>>> > > > > > > > > > > > >
>>>> > > > > > > > > > > > > Are there any ideas and/or prototypes for
>the
>>>> API?
>>>> > Your
>>>> > > > > > design
>>>> > > > > > > > > > > > suggestions
>>>> > > > > > > > > > > > > seem to make sense, but I would like to see
>how
>>>> it
>>>> > all
>>>> > > > this
>>>> > > > > > > will
>>>> > > > > > > > > like
>>>> > > > > > > > > > > > from
>>>> > > > > > > > > > > > > user's standpoint.
>>>> > > > > > > > > > > > >
>>>> > > > > > > > > > > > > -Val
>>>> > > > > > > > > > > > >
>>>> > > > > > > > > > > > > On Wed, Jun 7, 2017 at 1:06 AM, Антон
>Чураев <
>>>> > > > > > > > churaev.an@gmail.com
>>>> > > > > > > > > >
>>>> > > > > > > > > > > > wrote:
>>>> > > > > > > > > > > > >
>>>> > > > > > > > > > > > > > Vyacheslav, correct me if something wrong
>>>> > > > > > > > > > > > > >
>>>> > > > > > > > > > > > > > We could provide opportunity of choose
>between
>>>> CPU
>>>> > > > usage
>>>> > > > > > and
>>>> > > > > > > > > > MEM/NET
>>>> > > > > > > > > > > > > usage
>>>> > > > > > > > > > > > > > for users by compression some attributes
>of
>>>> stored
>>>> > > > > objects.
>>>> > > > > > > > > > > > > > You have learned design, and it is
>possible to
>>>> > > localize
>>>> > > > > > > changes
>>>> > > > > > > > > in
>>>> > > > > > > > > > > > > > marshalling without performance affect
>and
>>>> current
>>>> > > > > > > > functionality.
>>>> > > > > > > > > > > > > >
>>>> > > > > > > > > > > > > > I think, that it's usefull for our
>project and
>>>> > users.
>>>> > > > > > > > > > > > > > Community, what do you think about this
>>>> proposal?
>>>> > > > > > > > > > > > > >
>>>> > > > > > > > > > > > > >
>>>> > > > > > > > > > > > > > 2017-06-06 17:29 GMT+03:00 Vyacheslav
>Daradur <
>>>> > > > > > > > > daradurvs@gmail.com
>>>> > > > > > > > > > >:
>>>> > > > > > > > > > > > > >
>>>> > > > > > > > > > > > > > > In short,
>>>> > > > > > > > > > > > > > >
>>>> > > > > > > > > > > > > > > During marshalling a fields is
>represented as
>>>> > > > > > > > > BinaryFieldAccessor
>>>> > > > > > > > > > > > which
>>>> > > > > > > > > > > > > > > manages its marshalling. It checks if
>the
>>>> field
>>>> > is
>>>> > > > > marked
>>>> > > > > > > by
>>>> > > > > > > > > > > > annotation
>>>> > > > > > > > > > > > > > > @BinaryCompression, in that case -
>binary
>>>> > > > > representation
>>>> > > > > > > of
>>>> > > > > > > > > > field
>>>> > > > > > > > > > > > > (bytes
>>>> > > > > > > > > > > > > > > array) will be compressed. It will be
>marked
>>>> as
>>>> > > > > > compressed
>>>> > > > > > > by
>>>> > > > > > > > > > types
>>>> > > > > > > > > > > > > > > constant
>(GridBinaryMarshaller.COMPRESSED),
>>>> > after
>>>> > > > this
>>>> > > > > > the
>>>> > > > > > > > > > > > compressed
>>>> > > > > > > > > > > > > > > bytes
>>>> > > > > > > > > > > > > > > array wiil be include in binary
>>>> representation of
>>>> > > > whole
>>>> > > > > > > > object.
>>>> > > > > > > > > > > Note,
>>>> > > > > > > > > > > > > > > header of marshalled object will not be
>>>> > compressed.
>>>> > > > > > > > Compression
>>>> > > > > > > > > > > > > affected
>>>> > > > > > > > > > > > > > > only object's field representation.
>>>> > > > > > > > > > > > > > >
>>>> > > > > > > > > > > > > > > Objects in IgniteCache is represented
>as
>>>> > > BinaryObject
>>>> > > > > > which
>>>> > > > > > > > is
>>>> > > > > > > > > > > > wrapper
>>>> > > > > > > > > > > > > > over
>>>> > > > > > > > > > > > > > > bytes array of marshalled object.
>>>> > > > > > > > > > > > > > > BinaryObject provides some usefull
>methods,
>>>> which
>>>> > > are
>>>> > > > > > used
>>>> > > > > > > by
>>>> > > > > > > > > > > Ignite
>>>> > > > > > > > > > > > > > > systems.
>>>> > > > > > > > > > > > > > > For example, the Queries use
>>>> BinaryObject#field
>>>> > > > method,
>>>> > > > > > > which
>>>> > > > > > > > > > > > > > deserializes
>>>> > > > > > > > > > > > > > > only field of object, without
>deserializing
>>>> of
>>>> > > whole
>>>> > > > > > > object.
>>>> > > > > > > > > > > > > > > BinaryObject#field method during
>>>> deserialization,
>>>> > > if
>>>> > > > > > meets
>>>> > > > > > > > the
>>>> > > > > > > > > > > > constant
>>>> > > > > > > > > > > > > > of
>>>> > > > > > > > > > > > > > > compressed type, decompress this bytes
>array,
>>>> > then
>>>> > > > > > continue
>>>> > > > > > > > > > > > > unmarshalling
>>>> > > > > > > > > > > > > > > as usual.
>>>> > > > > > > > > > > > > > >
>>>> > > > > > > > > > > > > > > Now, I introduced the Compressor
>interface in
>>>> > > > > > > > > > IgniteConfigurations,
>>>> > > > > > > > > > > > it
>>>> > > > > > > > > > > > > > > allows user to use own implementation
>of
>>>> > > compressor -
>>>> > > > > it
>>>> > > > > > is
>>>> > > > > > > > the
>>>> > > > > > > > > > > > > > requirement
>>>> > > > > > > > > > > > > > > in the task[1].
>>>> > > > > > > > > > > > > > >
>>>> > > > > > > > > > > > > > > As far as I know, Vladimir Ozerov
>doesn't
>>>> like
>>>> > the
>>>> > > > idea
>>>> > > > > > of
>>>> > > > > > > > > > granting
>>>> > > > > > > > > > > > > this
>>>> > > > > > > > > > > > > > > opportunity to the user.
>>>> > > > > > > > > > > > > > > In that case we can choose a
>compression
>>>> > algorithm
>>>> > > > > which
>>>> > > > > > we
>>>> > > > > > > > > will
>>>> > > > > > > > > > > > > provide
>>>> > > > > > > > > > > > > > by
>>>> > > > > > > > > > > > > > > default and will move the interface to
>>>> internals
>>>> > of
>>>> > > > > > binary
>>>> > > > > > > > > > > > > > infractructure.
>>>> > > > > > > > > > > > > > > For this case I've prepared
>benchmarked,
>>>> which
>>>> > I've
>>>> > > > > sent
>>>> > > > > > > > > earlier.
>>>> > > > > > > > > > > > > > >
>>>> > > > > > > > > > > > > > > I vote for ZSTD algorithm[2], it
>provides
>>>> good
>>>> > > > > > compression
>>>> > > > > > > > > ratio
>>>> > > > > > > > > > > and
>>>> > > > > > > > > > > > > good
>>>> > > > > > > > > > > > > > > throughput. It has implementation in
>Java,
>>>> .NET
>>>> > and
>>>> > > > > C++,
>>>> > > > > > > and
>>>> > > > > > > > > has
>>>> > > > > > > > > > > > > > > ASF-friendly license, we can use it in
>the
>>>> all
>>>> > > Ignite
>>>> > > > > > > > > platforms.
>>>> > > > > > > > > > > > > > > You can look at an assessment of this
>>>> algorithm
>>>> > in
>>>> > > my
>>>> > > > > > > > > benchmark's
>>>> > > > > > > > > > > > > > >
>>>> > > > > > > > > > > > > > > [1] https://issues.apache.org/
>>>> > > > jira/browse/IGNITE-3592
>>>> > > > > > > > > > > > > > > [2]https://github.com/facebook/zstd
>>>> > > > > > > > > > > > > > >
>>>> > > > > > > > > > > > > > >
>>>> > > > > > > > > > > > > > > 2017-06-06 16:02 GMT+03:00 Антон Чураев
><
>>>> > > > > > > > churaev.an@gmail.com
>>>> > > > > > > > > >:
>>>> > > > > > > > > > > > > > >
>>>> > > > > > > > > > > > > > > > Looks good for me.
>>>> > > > > > > > > > > > > > > >
>>>> > > > > > > > > > > > > > > > Could You propose design of
>implementation
>>>> in
>>>> > > > couple
>>>> > > > > of
>>>> > > > > > > > > > > sentences?
>>>> > > > > > > > > > > > > > > > So that we can estimate the
>completeness
>>>> and
>>>> > > > > complexity
>>>> > > > > > > of
>>>> > > > > > > > > the
>>>> > > > > > > > > > > > > > proposal.
>>>> > > > > > > > > > > > > > > >
>>>> > > > > > > > > > > > > > > > 2017-06-06 15:26 GMT+03:00 Vyacheslav
>>>> Daradur <
>>>> > > > > > > > > > > daradurvs@gmail.com
>>>> > > > > > > > > > > > >:
>>>> > > > > > > > > > > > > > > >
>>>> > > > > > > > > > > > > > > > > Anton,
>>>> > > > > > > > > > > > > > > > >
>>>> > > > > > > > > > > > > > > > > Of course, the solution does not
>affect
>>>> on
>>>> > > > existing
>>>> > > > > > > > > > > > > implementation. I
>>>> > > > > > > > > > > > > > > > mean,
>>>> > > > > > > > > > > > > > > > > there is no changes if user not use
>the
>>>> > > > annotation
>>>> > > > > > > > > > > > > > @BinaryCompression.
>>>> > > > > > > > > > > > > > > > (no
>>>> > > > > > > > > > > > > > > > > performance changes)
>>>> > > > > > > > > > > > > > > > > Only if user make decision to use
>>>> compression
>>>> > > on
>>>> > > > > > > specific
>>>> > > > > > > > > > field
>>>> > > > > > > > > > > > or
>>>> > > > > > > > > > > > > > > fields
>>>> > > > > > > > > > > > > > > > > of a class - in that case
>compression
>>>> will be
>>>> > > > used
>>>> > > > > at
>>>> > > > > > > > > > > marshalling
>>>> > > > > > > > > > > > > in
>>>> > > > > > > > > > > > > > > > > relation to annotated fields.
>>>> > > > > > > > > > > > > > > > >
>>>> > > > > > > > > > > > > > > > > 2017-06-06 15:10 GMT+03:00 Антон
>Чураев <
>>>> > > > > > > > > > churaev.an@gmail.com
>>>> > > > > > > > > > > >:
>>>> > > > > > > > > > > > > > > > >
>>>> > > > > > > > > > > > > > > > > > Vyacheslav,
>>>> > > > > > > > > > > > > > > > > >
>>>> > > > > > > > > > > > > > > > > > Is it possible to propose
>>>> implementation
>>>> > that
>>>> > > > can
>>>> > > > > > be
>>>> > > > > > > > > > switched
>>>> > > > > > > > > > > > on
>>>> > > > > > > > > > > > > > > > > on-demand?
>>>> > > > > > > > > > > > > > > > > > In this case it should not affect
>>>> > performance
>>>> > > > of
>>>> > > > > > > > current
>>>> > > > > > > > > > > > > solution.
>>>> > > > > > > > > > > > > > > > > >
>>>> > > > > > > > > > > > > > > > > > I mean, that users should make
>decision
>>>> > what
>>>> > > is
>>>> > > > > > more
>>>> > > > > > > > > > > important
>>>> > > > > > > > > > > > > for
>>>> > > > > > > > > > > > > > > > them:
>>>> > > > > > > > > > > > > > > > > > throutput or memory/net usage.
>>>> > > > > > > > > > > > > > > > > > May be they will be choose not
>all
>>>> objects,
>>>> > > or
>>>> > > > > only
>>>> > > > > > > > some
>>>> > > > > > > > > > > > > attributes
>>>> > > > > > > > > > > > > > > of
>>>> > > > > > > > > > > > > > > > > > objects for compress.
>>>> > > > > > > > > > > > > > > > > >
>>>> > > > > > > > > > > > > > > > > > 2017-06-06 14:48 GMT+03:00
>Vyacheslav
>>>> > > Daradur <
>>>> > > > > > > > > > > > > daradurvs@gmail.com
>>>> > > > > > > > > > > > > > >:
>>>> > > > > > > > > > > > > > > > > >
>>>> > > > > > > > > > > > > > > > > > > Conclusion:
>>>> > > > > > > > > > > > > > > > > > > Provided solution allows reduce
>size
>>>> of
>>>> > an
>>>> > > > > object
>>>> > > > > > > in
>>>> > > > > > > > > > > > > IgniteCache
>>>> > > > > > > > > > > > > > at
>>>> > > > > > > > > > > > > > > > the
>>>> > > > > > > > > > > > > > > > > > > cost of throughput reduction
>(small
>>>> - in
>>>> > > some
>>>> > > > > > > cases),
>>>> > > > > > > > > it
>>>> > > > > > > > > > > > > depends
>>>> > > > > > > > > > > > > > on
>>>> > > > > > > > > > > > > > > > > part
>>>> > > > > > > > > > > > > > > > > > of
>>>> > > > > > > > > > > > > > > > > > > object which will be compressed
>and
>>>> > > > compression
>>>> > > > > > > > > > algorithm.
>>>> > > > > > > > > > > > > > > > > > > I mean, we can make more
>effective
>>>> use of
>>>> > > > > memory,
>>>> > > > > > > and
>>>> > > > > > > > > in
>>>> > > > > > > > > > > some
>>>> > > > > > > > > > > > > > cases
>>>> > > > > > > > > > > > > > > > it
>>>> > > > > > > > > > > > > > > > > > can
>>>> > > > > > > > > > > > > > > > > > > reduce loading of the
>interconnect.
>>>> > > > > (replication,
>>>> > > > > > > > > > > > rebalancing)
>>>> > > > > > > > > > > > > > > > > > >
>>>> > > > > > > > > > > > > > > > > > > Especially, it will be
>particularly
>>>> > useful
>>>> > > > for
>>>> > > > > > > > object's
>>>> > > > > > > > > > > > fields
>>>> > > > > > > > > > > > > > > which
>>>> > > > > > > > > > > > > > > > > are
>>>> > > > > > > > > > > > > > > > > > > large text (>~ 250 bytes) and
>can be
>>>> > > > > effectively
>>>> > > > > > > > > > > compressed.
>>>> > > > > > > > > > > > > > > > > > >
>>>> > > > > > > > > > > > > > > > > > > 2017-06-06 12:00 GMT+03:00
>Антон
>>>> Чураев <
>>>> > > > > > > > > > > > churaev.an@gmail.com
>>>> > > > > > > > > > > > > >:
>>>> > > > > > > > > > > > > > > > > > >
>>>> > > > > > > > > > > > > > > > > > > > Vyacheslav, thank you! But
>could
>>>> you
>>>> > > please
>>>> > > > > > > > provide a
>>>> > > > > > > > > > > > > > conclusions
>>>> > > > > > > > > > > > > > > > or
>>>> > > > > > > > > > > > > > > > > > > > proposals based on this
>benchmarks?
>>>> > > > > > > > > > > > > > > > > > > >
>>>> > > > > > > > > > > > > > > > > > > > 2017-06-06 11:28 GMT+03:00
>>>> Vyacheslav
>>>> > > > > Daradur <
>>>> > > > > > > > > > > > > > > daradurvs@gmail.com
>>>> > > > > > > > > > > > > > > > >:
>>>> > > > > > > > > > > > > > > > > > > >
>>>> > > > > > > > > > > > > > > > > > > > > Dmitry,
>>>> > > > > > > > > > > > > > > > > > > > >
>>>> > > > > > > > > > > > > > > > > > > > > Excel-pages:
>>>> > > > > > > > > > > > > > > > > > > > >
>>>> > > > > > > > > > > > > > > > > > > > > 1). "Compression ratio (2)"
>-
>>>> shows
>>>> > > > object
>>>> > > > > > > size,
>>>> > > > > > > > > with
>>>> > > > > > > > > > > > > > > compression
>>>> > > > > > > > > > > > > > > > > and
>>>> > > > > > > > > > > > > > > > > > > > > without compression.
>(Conditions:
>>>> > > literal
>>>> > > > > > text)
>>>> > > > > > > > > > > > > > > > > > > > > 1st graph shows compression
>>>> ratios of
>>>> > > > using
>>>> > > > > > > > > different
>>>> > > > > > > > > > > > > > > compression
>>>> > > > > > > > > > > > > > > > > > > > algrithms
>>>> > > > > > > > > > > > > > > > > > > > > depending on size of
>compressed
>>>> > field.
>>>> > > > > > > > > > > > > > > > > > > > > 2nd graph shows evaluation
>of
>>>> size of
>>>> > > > > objects
>>>> > > > > > > > > > depending
>>>> > > > > > > > > > > > on
>>>> > > > > > > > > > > > > > > sizes
>>>> > > > > > > > > > > > > > > > > and
>>>> > > > > > > > > > > > > > > > > > > > > compression algorithms.
>>>> > > > > > > > > > > > > > > > > > > > >
>>>> > > > > > > > > > > > > > > > > > > > > 2). "Compression ratio (1)"
>-
>>>> shows
>>>> > > > object
>>>> > > > > > > size,
>>>> > > > > > > > > with
>>>> > > > > > > > > > > > > > > compression
>>>> > > > > > > > > > > > > > > > > and
>>>> > > > > > > > > > > > > > > > > > > > > without compression.
>(Conditions:
>>>> > > badly
>>>> > > > > > > > compressed
>>>> > > > > > > > > > > > > character
>>>> > > > > > > > > > > > > > > > > > sequence)
>>>> > > > > > > > > > > > > > > > > > > > > 1st graph shows compression
>>>> ratios of
>>>> > > > using
>>>> > > > > > > > > different
>>>> > > > > > > > > > > > > > > compression
>>>> > > > > > > > > > > > > > > > > > > > > algrithms depending on size
>of
>>>> > > compressed
>>>> > > > > > > field.
>>>> > > > > > > > > > > > > > > > > > > > > 2nd graph shows evaluation
>of
>>>> size of
>>>> > > > > objects
>>>> > > > > > > > > > depending
>>>> > > > > > > > > > > > on
>>>> > > > > > > > > > > > > > > sizes
>>>> > > > > > > > > > > > > > > > > and
>>>> > > > > > > > > > > > > > > > > > > > > compression algorithms.
>>>> > > > > > > > > > > > > > > > > > > > >
>>>> > > > > > > > > > > > > > > > > > > > > 3) 'put-avg" - shows
>average
>>>> time of
>>>> > > the
>>>> > > > > > "put"
>>>> > > > > > > > > > > operation
>>>> > > > > > > > > > > > > > > > depending
>>>> > > > > > > > > > > > > > > > > on
>>>> > > > > > > > > > > > > > > > > > > > size
>>>> > > > > > > > > > > > > > > > > > > > > and compression algorithms.
>>>> > > > > > > > > > > > > > > > > > > > >
>>>> > > > > > > > > > > > > > > > > > > > > 4) 'put-thrpt" - shows
>>>> throughput of
>>>> > > the
>>>> > > > > > "put"
>>>> > > > > > > > > > > operation
>>>> > > > > > > > > > > > > > > > depending
>>>> > > > > > > > > > > > > > > > > on
>>>> > > > > > > > > > > > > > > > > > > > size
>>>> > > > > > > > > > > > > > > > > > > > > and compression algorithms.
>>>> > > > > > > > > > > > > > > > > > > > >
>>>> > > > > > > > > > > > > > > > > > > > > 5) 'get-avg" - shows
>average
>>>> time of
>>>> > > the
>>>> > > > > > "get"
>>>> > > > > > > > > > > operation
>>>> > > > > > > > > > > > > > > > depending
>>>> > > > > > > > > > > > > > > > > on
>>>> > > > > > > > > > > > > > > > > > > > size
>>>> > > > > > > > > > > > > > > > > > > > > and compression algorithms.
>>>> > > > > > > > > > > > > > > > > > > > >
>>>> > > > > > > > > > > > > > > > > > > > > 6) 'get-thrpt" - shows
>>>> throughput of
>>>> > > the
>>>> > > > > > "get"
>>>> > > > > > > > > > > operation
>>>> > > > > > > > > > > > > > > > depending
>>>> > > > > > > > > > > > > > > > > on
>>>> > > > > > > > > > > > > > > > > > > > size
>>>> > > > > > > > > > > > > > > > > > > > > and compression algorithms.
>>>> > > > > > > > > > > > > > > > > > > > >
>>>> > > > > > > > > > > > > > > > > > > > >
>>>> > > > > > > > > > > > > > > > > > > > >
>>>> > > > > > > > > > > > > > > > > > > > >
>>>> > > > > > > > > > > > > > > > > > > > > 2017-06-06 10:59 GMT+03:00
>>>> Dmitriy
>>>> > > > > Setrakyan
>>>> > > > > > <
>>>> > > > > > > > > > > > > > > > > dsetrakyan@apache.org
>>>> > > > > > > > > > > > > > > > > > >:
>>>> > > > > > > > > > > > > > > > > > > > >
>>>> > > > > > > > > > > > > > > > > > > > > > Vladimir, I am not sure
>how to
>>>> > > > interpret
>>>> > > > > > the
>>>> > > > > > > > > > graphs?
>>>> > > > > > > > > > > > What
>>>> > > > > > > > > > > > > > are
>>>> > > > > > > > > > > > > > > > we
>>>> > > > > > > > > > > > > > > > > > > > looking
>>>> > > > > > > > > > > > > > > > > > > > > > at?
>>>> > > > > > > > > > > > > > > > > > > > > >
>>>> > > > > > > > > > > > > > > > > > > > > > On Tue, Jun 6, 2017 at
>12:33
>>>> AM,
>>>> > > > > Vyacheslav
>>>> > > > > > > > > > Daradur <
>>>> > > > > > > > > > > > > > > > > > > > daradurvs@gmail.com
>>>> > > > > > > > > > > > > > > > > > > > > >
>>>> > > > > > > > > > > > > > > > > > > > > > wrote:
>>>> > > > > > > > > > > > > > > > > > > > > >
>>>> > > > > > > > > > > > > > > > > > > > > > > Hi, Igniters.
>>>> > > > > > > > > > > > > > > > > > > > > > >
>>>> > > > > > > > > > > > > > > > > > > > > > > I've prepared some
>>>> benchmarking.
>>>> > > > > Results
>>>> > > > > > > [1].
>>>> > > > > > > > > > > > > > > > > > > > > > >
>>>> > > > > > > > > > > > > > > > > > > > > > > And I've prepared the
>>>> evaluation
>>>> > in
>>>> > > > the
>>>> > > > > > > form
>>>> > > > > > > > of
>>>> > > > > > > > > > > > > diagrams
>>>> > > > > > > > > > > > > > > [2].
>>>> > > > > > > > > > > > > > > > > > > > > > >
>>>> > > > > > > > > > > > > > > > > > > > > > > I hope that helps to
>>>> interest the
>>>> > > > > > community
>>>> > > > > > > > and
>>>> > > > > > > > > > > > > > > accelerates a
>>>> > > > > > > > > > > > > > > > > > > > reaction
>>>> > > > > > > > > > > > > > > > > > > > > to
>>>> > > > > > > > > > > > > > > > > > > > > > > this improvment :)
>>>> > > > > > > > > > > > > > > > > > > > > > >
>>>> > > > > > > > > > > > > > > > > > > > > > > [1]
>>>> > > > > > > > > > > > > > > > > > > > > > >
>>>> https://github.com/daradurvs/
>>>> > > > > > > > > > > > ignite-compression/tree/
>>>> > > > > > > > > > > > > > > > > > > > > > >
>>>> master/src/main/resources/result
>>>> > > > > > > > > > > > > > > > > > > > > > > [2]
>>>> > https://drive.google.com/file/
>>>> > > d/
>>>> > > > > > > > > > > > > > > > > > 0B2CeUAOgrHkoMklyZ25YTEdKcEk/
>>>> > > > > > > > > > > > > > > > > > > > view
>>>> > > > > > > > > > > > > > > > > > > > > > >
>>>> > > > > > > > > > > > > > > > > > > > > > >
>>>> > > > > > > > > > > > > > > > > > > > > > >
>>>> > > > > > > > > > > > > > > > > > > > > > > 2017-05-24 9:49
>GMT+03:00
>>>> > > Vyacheslav
>>>> > > > > > > Daradur
>>>> > > > > > > > <
>>>> > > > > > > > > > > > > > > > > > daradurvs@gmail.com
>>>> > > > > > > > > > > > > > > > > > > >:
>>>> > > > > > > > > > > > > > > > > > > > > > >
>>>> > > > > > > > > > > > > > > > > > > > > > > > Guys, any thoughts?
>>>> > > > > > > > > > > > > > > > > > > > > > > >
>>>> > > > > > > > > > > > > > > > > > > > > > > > 2017-05-16 13:40
>GMT+03:00
>>>> > > > Vyacheslav
>>>> > > > > > > > > Daradur <
>>>> > > > > > > > > > > > > > > > > > > daradurvs@gmail.com
>>>> > > > > > > > > > > > > > > > > > > > >:
>>>> > > > > > > > > > > > > > > > > > > > > > > >
>>>> > > > > > > > > > > > > > > > > > > > > > > >> Hi guys,
>>>> > > > > > > > > > > > > > > > > > > > > > > >>
>>>> > > > > > > > > > > > > > > > > > > > > > > >> I've prepared the PR
>to
>>>> show
>>>> > my
>>>> > > > > idea.
>>>> > > > > > > > > > > > > > > > > > > > > > > >>
>>>> https://github.com/apache/
>>>> > > > > > > > > > > ignite/pull/1951/files
>>>> > > > > > > > > > > > > > > > > > > > > > > >>
>>>> > > > > > > > > > > > > > > > > > > > > > > >> About querying -
>I've just
>>>> > > copied
>>>> > > > > > > existing
>>>> > > > > > > > > > tests
>>>> > > > > > > > > > > > and
>>>> > > > > > > > > > > > > > > have
>>>> > > > > > > > > > > > > > > > > > > > annotated
>>>> > > > > > > > > > > > > > > > > > > > > > the
>>>> > > > > > > > > > > > > > > > > > > > > > > >> testing data.
>>>> > > > > > > > > > > > > > > > > > > > > > > >>
>>>> https://github.com/apache/
>>>> > > > > > > > > > > > > > ignite/pull/1951/files#diff-
>>>> > > > > > > > > > > > > > > > > c19a9d
>>>> > > > > > > > > > > > > > > > > > > > > > > >>
>f4058141d059bb577e75244764
>>>> > > > > > > > > > > > > > > > > > > > > > > >>
>>>> > > > > > > > > > > > > > > > > > > > > > > >> It means fields
>which
>>>> will be
>>>> > > > marked
>>>> > > > > > by
>>>> > > > > > > > > > > > > > > @BinaryCompression
>>>> > > > > > > > > > > > > > > > > > will
>>>> > > > > > > > > > > > > > > > > > > be
>>>> > > > > > > > > > > > > > > > > > > > > > > >> compressed at
>marshalling
>>>> via
>>>> > > > > > > > > > BinaryMarshaller.
>>>> > > > > > > > > > > > > > > > > > > > > > > >>
>>>> > > > > > > > > > > > > > > > > > > > > > > >> This solution has no
>>>> effect on
>>>> > > > > > existing
>>>> > > > > > > > data
>>>> > > > > > > > > > or
>>>> > > > > > > > > > > > > > project
>>>> > > > > > > > > > > > > > > > > > > > > architecture.
>>>> > > > > > > > > > > > > > > > > > > > > > > >>
>>>> > > > > > > > > > > > > > > > > > > > > > > >> I'll be glad to see
>your
>>>> > > thougths.
>>>> > > > > > > > > > > > > > > > > > > > > > > >>
>>>> > > > > > > > > > > > > > > > > > > > > > > >>
>>>> > > > > > > > > > > > > > > > > > > > > > > >> 2017-05-15 19:18
>GMT+03:00
>>>> > > > > Vyacheslav
>>>> > > > > > > > > Daradur
>>>> > > > > > > > > > <
>>>> > > > > > > > > > > > > > > > > > > > daradurvs@gmail.com
>>>> > > > > > > > > > > > > > > > > > > > > >:
>>>> > > > > > > > > > > > > > > > > > > > > > > >>
>>>> > > > > > > > > > > > > > > > > > > > > > > >>> Dmitriy,
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>
>>>> > > > > > > > > > > > > > > > > > > > > > > >>> I have ready
>prototype. I
>>>> > want
>>>> > > to
>>>> > > > > > show
>>>> > > > > > > > it.
>>>> > > > > > > > > > > > > > > > > > > > > > > >>> It is always easier
>to
>>>> > discuss
>>>> > > on
>>>> > > > > > > > example.
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>
>>>> > > > > > > > > > > > > > > > > > > > > > > >>> 2017-05-15 19:02
>>>> GMT+03:00
>>>> > > > Dmitriy
>>>> > > > > > > > > Setrakyan
>>>> > > > > > > > > > <
>>>> > > > > > > > > > > > > > > > > > > > > dsetrakyan@apache.org
>>>> > > > > > > > > > > > > > > > > > > > > > >:
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> Vyacheslav,
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>>
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> I think it is a
>bit
>>>> > premature
>>>> > > to
>>>> > > > > > > > provide a
>>>> > > > > > > > > > PR
>>>> > > > > > > > > > > > > > without
>>>> > > > > > > > > > > > > > > > > > getting
>>>> > > > > > > > > > > > > > > > > > > a
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> community
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> consensus on the
>dev
>>>> list.
>>>> > > > Please
>>>> > > > > > > allow
>>>> > > > > > > > > some
>>>> > > > > > > > > > > > time
>>>> > > > > > > > > > > > > > for
>>>> > > > > > > > > > > > > > > > the
>>>> > > > > > > > > > > > > > > > > > > > > community
>>>> > > > > > > > > > > > > > > > > > > > > > to
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> respond.
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>>
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> D.
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>>
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> On Mon, May 15,
>2017 at
>>>> 6:36
>>>> > > AM,
>>>> > > > > > > > > Vyacheslav
>>>> > > > > > > > > > > > > Daradur
>>>> > > > > > > > > > > > > > <
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>>
>daradurvs@gmail.com>
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> wrote:
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>>
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > I created the
>ticket:
>>>> > > > > > > > > > > > > > > https://issues.apache.org/jira
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>>
>/browse/IGNITE-5226
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> >
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > I'll prepare a
>PR with
>>>> > > > described
>>>> > > > > > > > > solution
>>>> > > > > > > > > > in
>>>> > > > > > > > > > > > > > couple
>>>> > > > > > > > > > > > > > > of
>>>> > > > > > > > > > > > > > > > > > days.
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> >
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > 2017-05-15 15:05
>>>> GMT+03:00
>>>> > > > > > > Vyacheslav
>>>> > > > > > > > > > > Daradur
>>>> > > > > > > > > > > > <
>>>> > > > > > > > > > > > > > > > > > > > > > daradurvs@gmail.com
>>>> > > > > > > > > > > > > > > > > > > > > > > >:
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> >
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > > Hi, Igniters!
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > > Apache 2.0 is
>>>> released.
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > > Let's continue
>the
>>>> > > > discussion
>>>> > > > > > > about
>>>> > > > > > > > a
>>>> > > > > > > > > > > > > > compression
>>>> > > > > > > > > > > > > > > > > > design.
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > > At the moment,
>I
>>>> found
>>>> > > only
>>>> > > > > one
>>>> > > > > > > > > solution
>>>> > > > > > > > > > > > which
>>>> > > > > > > > > > > > > > is
>>>> > > > > > > > > > > > > > > > > > > compatible
>>>> > > > > > > > > > > > > > > > > > > > > > with
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > querying
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > > and indexing,
>this
>>>> is
>>>> > > > > > > > > per-objects-field
>>>> > > > > > > > > > > > > > > compression.
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > > Per-fields
>>>> compression
>>>> > > means
>>>> > > > > > that
>>>> > > > > > > > > > metadata
>>>> > > > > > > > > > > > (a
>>>> > > > > > > > > > > > > > > > header)
>>>> > > > > > > > > > > > > > > > > of
>>>> > > > > > > > > > > > > > > > > > > an
>>>> > > > > > > > > > > > > > > > > > > > > > object
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> won't
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > > be compressed,
>only
>>>> > > > serialized
>>>> > > > > > > > values
>>>> > > > > > > > > of
>>>> > > > > > > > > > > an
>>>> > > > > > > > > > > > > > object
>>>> > > > > > > > > > > > > > > > > > fields
>>>> > > > > > > > > > > > > > > > > > > > (in
>>>> > > > > > > > > > > > > > > > > > > > > > > bytes
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> array
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > > form) will be
>>>> > compressed.
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > > This solution
>have
>>>> some
>>>> > > > > > > contentious
>>>> > > > > > > > > > > issues:
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > > - small
>values, like
>>>> > > > > primitives
>>>> > > > > > > and
>>>> > > > > > > > > > short
>>>> > > > > > > > > > > > > > arrays -
>>>> > > > > > > > > > > > > > > > > there
>>>> > > > > > > > > > > > > > > > > > > > isn't
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> sense to
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > > compress them;
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > > - there is no
>>>> possible
>>>> > to
>>>> > > > use
>>>> > > > > > > > > > compression
>>>> > > > > > > > > > > > with
>>>> > > > > > > > > > > > > > > > > > > > java-predefined
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> types;
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > > We can provide
>an
>>>> > > > annotation,
>>>> > > > > > > > > > > > > > @IgniteCompression -
>>>> > > > > > > > > > > > > > > > for
>>>> > > > > > > > > > > > > > > > > > > > > example,
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> which can
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > > be used by
>users for
>>>> > > marking
>>>> > > > > > > fields
>>>> > > > > > > > to
>>>> > > > > > > > > > > > > compress.
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > > Any thoughts?
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > > Maybe someone
>>>> already
>>>> > have
>>>> > > > > ready
>>>> > > > > > > > > design?
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > > 2017-04-10
>11:06
>>>> > GMT+03:00
>>>> > > > > > > > Vyacheslav
>>>> > > > > > > > > > > > Daradur
>>>> > > > > > > > > > > > > <
>>>> > > > > > > > > > > > > > > > > > > > > > > daradurvs@gmail.com
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> >:
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >> Alexey,
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >> Yes, I've
>read it.
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >> Ok, let's
>discuss
>>>> about
>>>> > > > > public
>>>> > > > > > > API
>>>> > > > > > > > > > > design.
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >> I think we
>need to
>>>> add
>>>> > > > some a
>>>> > > > > > > > > configure
>>>> > > > > > > > > > > > > entity
>>>> > > > > > > > > > > > > > to
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>>
>CacheConfiguration,
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >> which will
>contain
>>>> the
>>>> > > > > > Compressor
>>>> > > > > > > > > > > interface
>>>> > > > > > > > > > > > > > > > > > > implementation
>>>> > > > > > > > > > > > > > > > > > > > > and
>>>> > > > > > > > > > > > > > > > > > > > > > > some
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > usefull
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >> parameters.
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >> Or maybe to
>>>> provide a
>>>> > > > > > > > > BinaryMarshaller
>>>> > > > > > > > > > > > > > decorator,
>>>> > > > > > > > > > > > > > > > > which
>>>> > > > > > > > > > > > > > > > > > > > will
>>>> > > > > > > > > > > > > > > > > > > > > be
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> compress
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >> data after
>>>> marshalling.
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >> 2017-04-10
>10:40
>>>> > > GMT+03:00
>>>> > > > > > Alexey
>>>> > > > > > > > > > > > Kuznetsov <
>>>> > > > > > > > > > > > > > > > > > > > > > > akuznetsov@apache.org
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> >:
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> Vyacheslav,
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>>
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> Did you read
>>>> initial
>>>> > > > > > discussion
>>>> > > > > > > > [1]
>>>> > > > > > > > > > > about
>>>> > > > > > > > > > > > > > > > > compression?
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> As far as I
>>>> remember
>>>> > we
>>>> > > > > agreed
>>>> > > > > > > to
>>>> > > > > > > > > add
>>>> > > > > > > > > > > only
>>>> > > > > > > > > > > > > > some
>>>> > > > > > > > > > > > > > > > > > > > "top-level"
>>>> > > > > > > > > > > > > > > > > > > > > > API
>>>> > > > > > > > > > > > > > > > > > > > > > > in
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > order
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> to
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> provide a
>way for
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> Ignite users
>to
>>>> inject
>>>> > > > some
>>>> > > > > > sort
>>>> > > > > > > > of
>>>> > > > > > > > > > > custom
>>>> > > > > > > > > > > > > > > > > > compression.
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>>
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>>
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> [1]
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>>
>>>> > > > > > http://apache-ignite-developer
>>>> > > > > > > > > > > > > > > s.2346864.n4.nabble
>>>> > > > > > > > > > > > > > > > .
>>>> > > > > > > > > > > > > > > > > > > > > com/Data-c
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>>
>>>> > > ompression-in-Ignite-2-0-
>>>> > > > > > > > > td10099.html
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>>
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> On Mon, Apr
>10,
>>>> 2017
>>>> > at
>>>> > > > 2:19
>>>> > > > > > PM,
>>>> > > > > > > > > > > > daradurvs <
>>>> > > > > > > > > > > > > > > > > > > > > > daradurvs@gmail.com
>>>> > > > > > > > > > > > > > > > > > > > > > > >
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > wrote:
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>>
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> > Hi
>Igniters!
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> >
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> > I am
>interested
>>>> in
>>>> > > this
>>>> > > > > > task.
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> > Provide
>some
>>>> kind of
>>>> > > > > > pluggable
>>>> > > > > > > > > > > > compression
>>>> > > > > > > > > > > > > > SPI
>>>> > > > > > > > > > > > > > > > > > support
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> > <
>>>> > > > > https://issues.apache.org/
>>>> > > > > > > > > > > > > > > > > jira/browse/IGNITE-3592>
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> >
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> > I
>developed a
>>>> > solution
>>>> > > > on
>>>> > > > > > > > > > > > > > > > BinaryMarshaller-level,
>>>> > > > > > > > > > > > > > > > > > but
>>>> > > > > > > > > > > > > > > > > > > > > > reviewer
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> has
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> rejected
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> > it.
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> >
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> > Let's
>continue
>>>> > > > discussion
>>>> > > > > of
>>>> > > > > > > > task
>>>> > > > > > > > > > > goals
>>>> > > > > > > > > > > > > and
>>>> > > > > > > > > > > > > > > > > solution
>>>> > > > > > > > > > > > > > > > > > > > > design.
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> > As I
>understood
>>>> > that,
>>>> > > > the
>>>> > > > > > main
>>>> > > > > > > > > goal
>>>> > > > > > > > > > of
>>>> > > > > > > > > > > > > this
>>>> > > > > > > > > > > > > > > task
>>>> > > > > > > > > > > > > > > > > is
>>>> > > > > > > > > > > > > > > > > > to
>>>> > > > > > > > > > > > > > > > > > > > > store
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> data in
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> > compressed
>form.
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> > This is
>what I
>>>> need
>>>> > > from
>>>> > > > > > > Ignite
>>>> > > > > > > > as
>>>> > > > > > > > > > its
>>>> > > > > > > > > > > > > user.
>>>> > > > > > > > > > > > > > > > > > > Compression
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> provides
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> economy
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> > on
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> > servers.
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> > We can
>store
>>>> more
>>>> > data
>>>> > > > on
>>>> > > > > > same
>>>> > > > > > > > > > servers
>>>> > > > > > > > > > > > at
>>>> > > > > > > > > > > > > > the
>>>> > > > > > > > > > > > > > > > cost
>>>> > > > > > > > > > > > > > > > > > of
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> increasing CPU
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> >
>utilization.
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> >
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> > I'm
>researching
>>>> a
>>>> > > > > > possibility
>>>> > > > > > > of
>>>> > > > > > > > > > > > > > > implementation
>>>> > > > > > > > > > > > > > > > of
>>>> > > > > > > > > > > > > > > > > > > > > > compression
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> at the
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> >
>cache-level.
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> >
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> > Any
>thoughts?
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> >
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> > --
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> > Best
>regards,
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> > Vyacheslav
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> >
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> >
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> >
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> >
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> > --
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> > View this
>>>> message in
>>>> > > > > > context:
>>>> > > > > > > > > > > > > > > > > http://apache-ignite-
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> >
>>>> > > > > > developers.2346864.n4.nabble.
>>>> > > > > > > > > > > > > > > > > > com/Data-compression-in-
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> >
>>>> > > > > > Ignite-2-0-tp10099p16317.html
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> > Sent from
>the
>>>> Apache
>>>> > > > > Ignite
>>>> > > > > > > > > > Developers
>>>> > > > > > > > > > > > > > mailing
>>>> > > > > > > > > > > > > > > > > list
>>>> > > > > > > > > > > > > > > > > > > > > archive
>>>> > > > > > > > > > > > > > > > > > > > > > at
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> Nabble.com.
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> >
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>>
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>>
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>>
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> --
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>> Alexey
>Kuznetsov
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>>
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >> --
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >> Best Regards,
>>>> > Vyacheslav
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >>
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > >
>>>> > > > > > > > > > > > > > > > > > > > > > > >>>> > > --
>>>> > > > > > > > > > > > > > > > > > > > > > >
>>>>
>>> ...
>>
>> [Message clipped]

Re: Data compression in Ignite 2.0

Posted by Vladimir Ozerov <vo...@gridgain.com>.
Hi Vyacheslav,

Yes, I would suggest you to do so.

On Fri, Aug 25, 2017 at 2:51 PM, Vyacheslav Daradur <da...@gmail.com>
wrote:

> Hi, should I close the initial ticket [1] as "Won't Fix" and add link to
> the new discusion about storage compression [2] in comments?
>
> [1] https://issues.apache.org/jira/browse/IGNITE-3592
> [2] http://apache-ignite-developers.2346864.n4.nabble.
> com/Data-compression-in-Ignite-td20679.html
>
> 2017-08-09 23:05 GMT+03:00 Vyacheslav Daradur <da...@gmail.com>:
>
>> Vladimir, thank you for detailed explanation.
>>
>> I think I've understanded the main idea of described storage compression.
>>
>> I'll join the new discussion after researching of given material and
>> comlpetion of varint-optimization [1].
>>
>> [1] https://issues.apache.org/jira/browse/IGNITE-5097
>>
>> 2017-08-02 15:43 GMT+03:00 Alexey Kuznetsov <ak...@apache.org>:
>>
>>> Vova,
>>>
>>> Finally we back to my initial idea - to look how "big databases compress"
>>> data :)
>>>
>>>
>>> Just to remind how IBM DB2 do this[1].
>>>
>>> [1] http://www.ibm.com/developerworks/data/library/techarticle/dm-
>>> 1205db210compression/
>>> <http://www.ibm.com/developerworks/data/library/techarticle/dm-1205db210compression/>
>>>
>>> On Tue, Aug 1, 2017 at 4:15 PM, Vladimir Ozerov <vo...@gridgain.com>
>>> wrote:
>>>
>>> > Vyacheslav,
>>> >
>>> > This is not about my needs, but about the product :-) BinaryObject is a
>>> > central entity used for both data transfer and data storage. This is
>>> both
>>> > good and bad at the same time.
>>> >
>>> > Good thing is that as we optimize binary protocol, we improve both
>>> network
>>> > and storage performance at the same time. We have at least 3 things
>>> which
>>> > will be included into the product soon: varint encoding [1], optimized
>>> > string encoding [2] and null-field optimization [3]. Bad thing is that
>>> > binary object format is not well suited for data storage optimizations,
>>> > including compression. For example, one good compression technique is
>>> to
>>> > organize data in column-store format, or to introduce shared
>>> "dictionary"
>>> > with unique values on cache level. In both cases N equal values are not
>>> > stored N times. Instead, we store one value and N references to it, or
>>> so.
>>> > This way 2x-10x compression is possible depending on workload type.
>>> Binary
>>> > object protocol with some compression on top of it cannot give such
>>> > improvement, because it will compress data in individual objects,
>>> instead
>>> > of compressing the whole cache data in a single context.
>>> >
>>> > That said, I propose to give up adding compression to BinaryObject.
>>> This is
>>> > a dead end. Instead, we should:
>>> > 1) Optimize protocol itself to be more compact, as described in
>>> > aforementioned Ignite tickets
>>> > 2) Start new discussion about storage compression
>>> >
>>> > You can read papers of other vendors to get better understanding on
>>> > possible compression options. E.g. Oracle has a lot of compression
>>> > techniques, including heat maps, background compression, per-block
>>> > compression, data dictionaries, etc. [4].
>>> >
>>> > [1] https://issues.apache.org/jira/browse/IGNITE-5097
>>> > [2] https://issues.apache.org/jira/browse/IGNITE-5655
>>> > [3] https://issues.apache.org/jira/browse/IGNITE-3939
>>> > [4] http://www.oracle.com/technetwork/database/options/
>>> > compression/advanced-
>>> > compression-wp-12c-1896128.pdf
>>> >
>>> > Vladimir.
>>> >
>>> >
>>>
>>> --
>>> Alexey Kuznetsov
>>>
>>
>>
>>
>> --
>> Best Regards, Vyacheslav D.
>>
>
>
>
> --
> Best Regards, Vyacheslav D.
>

Re: Data compression in Ignite 2.0

Posted by Vyacheslav Daradur <da...@gmail.com>.
Hi, should I close the initial ticket [1] as "Won't Fix" and add link to
the new discusion about storage compression [2] in comments?

[1] https://issues.apache.org/jira/browse/IGNITE-3592
[2]
http://apache-ignite-developers.2346864.n4.nabble.com/Data-compression-in-Ignite-td20679.html

2017-08-09 23:05 GMT+03:00 Vyacheslav Daradur <da...@gmail.com>:

> Vladimir, thank you for detailed explanation.
>
> I think I've understanded the main idea of described storage compression.
>
> I'll join the new discussion after researching of given material and
> comlpetion of varint-optimization [1].
>
> [1] https://issues.apache.org/jira/browse/IGNITE-5097
>
> 2017-08-02 15:43 GMT+03:00 Alexey Kuznetsov <ak...@apache.org>:
>
>> Vova,
>>
>> Finally we back to my initial idea - to look how "big databases compress"
>> data :)
>>
>>
>> Just to remind how IBM DB2 do this[1].
>>
>> [1] http://www.ibm.com/developerworks/data/library/techarticle/dm-
>> 1205db210compression/
>> <http://www.ibm.com/developerworks/data/library/techarticle/dm-1205db210compression/>
>>
>> On Tue, Aug 1, 2017 at 4:15 PM, Vladimir Ozerov <vo...@gridgain.com>
>> wrote:
>>
>> > Vyacheslav,
>> >
>> > This is not about my needs, but about the product :-) BinaryObject is a
>> > central entity used for both data transfer and data storage. This is
>> both
>> > good and bad at the same time.
>> >
>> > Good thing is that as we optimize binary protocol, we improve both
>> network
>> > and storage performance at the same time. We have at least 3 things
>> which
>> > will be included into the product soon: varint encoding [1], optimized
>> > string encoding [2] and null-field optimization [3]. Bad thing is that
>> > binary object format is not well suited for data storage optimizations,
>> > including compression. For example, one good compression technique is to
>> > organize data in column-store format, or to introduce shared
>> "dictionary"
>> > with unique values on cache level. In both cases N equal values are not
>> > stored N times. Instead, we store one value and N references to it, or
>> so.
>> > This way 2x-10x compression is possible depending on workload type.
>> Binary
>> > object protocol with some compression on top of it cannot give such
>> > improvement, because it will compress data in individual objects,
>> instead
>> > of compressing the whole cache data in a single context.
>> >
>> > That said, I propose to give up adding compression to BinaryObject.
>> This is
>> > a dead end. Instead, we should:
>> > 1) Optimize protocol itself to be more compact, as described in
>> > aforementioned Ignite tickets
>> > 2) Start new discussion about storage compression
>> >
>> > You can read papers of other vendors to get better understanding on
>> > possible compression options. E.g. Oracle has a lot of compression
>> > techniques, including heat maps, background compression, per-block
>> > compression, data dictionaries, etc. [4].
>> >
>> > [1] https://issues.apache.org/jira/browse/IGNITE-5097
>> > [2] https://issues.apache.org/jira/browse/IGNITE-5655
>> > [3] https://issues.apache.org/jira/browse/IGNITE-3939
>> > [4] http://www.oracle.com/technetwork/database/options/
>> > compression/advanced-
>> > compression-wp-12c-1896128.pdf
>> >
>> > Vladimir.
>> >
>> >
>>
>> --
>> Alexey Kuznetsov
>>
>
>
>
> --
> Best Regards, Vyacheslav D.
>



-- 
Best Regards, Vyacheslav D.

Re: Data compression in Ignite 2.0

Posted by Vyacheslav Daradur <da...@gmail.com>.
Vladimir, thank you for detailed explanation.

I think I've understanded the main idea of described storage compression.

I'll join the new discussion after researching of given material and
comlpetion of varint-optimization [1].

[1] https://issues.apache.org/jira/browse/IGNITE-5097

2017-08-02 15:43 GMT+03:00 Alexey Kuznetsov <ak...@apache.org>:

> Vova,
>
> Finally we back to my initial idea - to look how "big databases compress"
> data :)
>
>
> Just to remind how IBM DB2 do this[1].
>
> [1] http://www.ibm.com/developerworks/data/library/techarticle/dm-
> 1205db210compression/
>
> On Tue, Aug 1, 2017 at 4:15 PM, Vladimir Ozerov <vo...@gridgain.com>
> wrote:
>
> > Vyacheslav,
> >
> > This is not about my needs, but about the product :-) BinaryObject is a
> > central entity used for both data transfer and data storage. This is both
> > good and bad at the same time.
> >
> > Good thing is that as we optimize binary protocol, we improve both
> network
> > and storage performance at the same time. We have at least 3 things which
> > will be included into the product soon: varint encoding [1], optimized
> > string encoding [2] and null-field optimization [3]. Bad thing is that
> > binary object format is not well suited for data storage optimizations,
> > including compression. For example, one good compression technique is to
> > organize data in column-store format, or to introduce shared "dictionary"
> > with unique values on cache level. In both cases N equal values are not
> > stored N times. Instead, we store one value and N references to it, or
> so.
> > This way 2x-10x compression is possible depending on workload type.
> Binary
> > object protocol with some compression on top of it cannot give such
> > improvement, because it will compress data in individual objects, instead
> > of compressing the whole cache data in a single context.
> >
> > That said, I propose to give up adding compression to BinaryObject. This
> is
> > a dead end. Instead, we should:
> > 1) Optimize protocol itself to be more compact, as described in
> > aforementioned Ignite tickets
> > 2) Start new discussion about storage compression
> >
> > You can read papers of other vendors to get better understanding on
> > possible compression options. E.g. Oracle has a lot of compression
> > techniques, including heat maps, background compression, per-block
> > compression, data dictionaries, etc. [4].
> >
> > [1] https://issues.apache.org/jira/browse/IGNITE-5097
> > [2] https://issues.apache.org/jira/browse/IGNITE-5655
> > [3] https://issues.apache.org/jira/browse/IGNITE-3939
> > [4] http://www.oracle.com/technetwork/database/options/
> > compression/advanced-
> > compression-wp-12c-1896128.pdf
> >
> > Vladimir.
> >
> >
>
> --
> Alexey Kuznetsov
>



-- 
Best Regards, Vyacheslav D.

Re: Data compression in Ignite 2.0

Posted by Alexey Kuznetsov <ak...@apache.org>.
Vova,

Finally we back to my initial idea - to look how "big databases compress"
data :)


Just to remind how IBM DB2 do this[1].

[1] http://www.ibm.com/developerworks/data/library/techarticle/dm-
1205db210compression/

On Tue, Aug 1, 2017 at 4:15 PM, Vladimir Ozerov <vo...@gridgain.com>
wrote:

> Vyacheslav,
>
> This is not about my needs, but about the product :-) BinaryObject is a
> central entity used for both data transfer and data storage. This is both
> good and bad at the same time.
>
> Good thing is that as we optimize binary protocol, we improve both network
> and storage performance at the same time. We have at least 3 things which
> will be included into the product soon: varint encoding [1], optimized
> string encoding [2] and null-field optimization [3]. Bad thing is that
> binary object format is not well suited for data storage optimizations,
> including compression. For example, one good compression technique is to
> organize data in column-store format, or to introduce shared "dictionary"
> with unique values on cache level. In both cases N equal values are not
> stored N times. Instead, we store one value and N references to it, or so.
> This way 2x-10x compression is possible depending on workload type. Binary
> object protocol with some compression on top of it cannot give such
> improvement, because it will compress data in individual objects, instead
> of compressing the whole cache data in a single context.
>
> That said, I propose to give up adding compression to BinaryObject. This is
> a dead end. Instead, we should:
> 1) Optimize protocol itself to be more compact, as described in
> aforementioned Ignite tickets
> 2) Start new discussion about storage compression
>
> You can read papers of other vendors to get better understanding on
> possible compression options. E.g. Oracle has a lot of compression
> techniques, including heat maps, background compression, per-block
> compression, data dictionaries, etc. [4].
>
> [1] https://issues.apache.org/jira/browse/IGNITE-5097
> [2] https://issues.apache.org/jira/browse/IGNITE-5655
> [3] https://issues.apache.org/jira/browse/IGNITE-3939
> [4] http://www.oracle.com/technetwork/database/options/
> compression/advanced-
> compression-wp-12c-1896128.pdf
>
> Vladimir.
>
>

-- 
Alexey Kuznetsov