You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ignite.apache.org by Valentin Kulichenko <va...@gmail.com> on 2016/10/28 03:07:15 UTC

Re: BinaryObject pros/cons

Cross-posting this to dev list.

Vladimir,

To be honest, I don't see much difference between null values for objects
and zero values for primitives. From BinaryObject semantics standpoint,
both are default values for corresponding types. These values will be
returned from the BinaryObject.field() method regardless of whether we
actually save then in the byte array or not. Having said that, why don't we
just skip them during write?

You optimization will be still useful though, because there are often a lot
of ints and longs that are not zeros, but still small and can fit 1-2
bytes. We already added such compaction in direct message marshaling and it
reduced overall traffic by around 30%.

-Val


On Thu, Oct 27, 2016 at 2:21 PM, Vladimir Ozerov <vo...@gridgain.com>
wrote:

> Hi,
>
> I am not very concerned with null fields overhead, because usually it
> won't be significant. However, there is a problem with zeros. User object
> might have lots of int/long zeros, this is not uncommon. And each zero will
> consume 4-8 additional bytes. We probably will implement special
> optimization which will write such fields in special compact format.
>
> Vladimir.
>
> On Thu, Oct 27, 2016 at 10:55 PM, vkulichenko <
> valentin.kulichenko@gmail.com> wrote:
>
>> Hi,
>>
>> Yes, null values consume memory. I believe this can be optimized, but I
>> haven't seen issues with this so far. Unless you have hundreds of fields
>> most of which are nulls (very rare case), the overhead is minimal.
>>
>> -Val
>>
>>
>>
>> --
>> View this message in context: http://apache-ignite-users.705
>> 18.x6.nabble.com/BinaryObject-pros-cons-tp8541p8563.html
>> Sent from the Apache Ignite Users mailing list archive at Nabble.com.
>>
>
>

Re: BinaryObject pros/cons

Posted by Dmitriy Setrakyan <ds...@apache.org>.
In my opinion, writing nulls or default values on the wire or in-memory is
plain wasteful. I agree with Vladimir that schema should be constant, but
internally we should not store the default values at all.

It sounds like a relatively simple task to implement. Do we have a ticket
for it?

D.

On Mon, Oct 31, 2016 at 1:00 PM, Vladimir Ozerov <vo...@gridgain.com>
wrote:

> Igor,
>
> Good catch. Probably some MAX value could help us here.
>
> On Mon, Oct 31, 2016 at 9:17 PM, Igor Sapego <is...@gridgain.com> wrote:
>
> > Valentin,
> >
> > -1 was just an example. I've checked - currently we use all possible
> range
> > of offset values.
> > So if we are going to use suggested approach then we need to reserve some
> > value and
> > adjust serialization/deserialization algorithms.
> >
> > Best Regards,
> > Igor
> >
> > On Mon, Oct 31, 2016 at 8:46 PM, Valentin Kulichenko <
> > valentin.kulichenko@gmail.com> wrote:
> >
> > > Makes sense to me, but not sure about -1 in particular. Is this offset
> > > relative to object start position? What values can it have?
> > >
> > > -Val
> > >
> > > On Mon, Oct 31, 2016 at 10:38 AM, Igor Sapego <is...@gridgain.com>
> > > wrote:
> > >
> > >> Vladimir,
> > >>
> > >> How about some reserved value? I.e -1 offset means a default/null
> value
> > >> should be used?
> > >>
> > >> Best Regards,
> > >> Igor
> > >>
> > >> On Mon, Oct 31, 2016 at 5:05 PM, Vladimir Ozerov <
> vozerov@gridgain.com>
> > >> wrote:
> > >>
> > >>> Valya,
> > >>>
> > >>> Do you have any ideas how to implement this? We write field offsets
> in
> > >>> the
> > >>> footer. If field is not written, then what should be used for its
> > offset?
> > >>>
> > >>> On Mon, Oct 31, 2016 at 4:56 PM, Valentin Kulichenko <
> > >>> valentin.kulichenko@gmail.com> wrote:
> > >>>
> > >>> > Vladimir,
> > >>> >
> > >>> > These are good points, but I'm not suggesting to change the schema.
> > If
> > >>> one
> > >>> > writes five fields, the schema should have five fields in any case,
> > >>> > regardless of values. I only suggest to change the internal
> > >>> representation
> > >>> > of the object and do not save fields with default values in the
> byte
> > >>> array
> > >>> > as we don't really need them there.
> > >>> >
> > >>> > -Val
> > >>> >
> > >>> > On Sun, Oct 30, 2016 at 12:24 PM, Vladimir Ozerov <
> > >>> vozerov@gridgain.com>
> > >>> > wrote:
> > >>> >
> > >>> >> Valya,
> > >>> >>
> > >>> >> I have several concerns:
> > >>> >> 1) Correctness: hasField() will not work properly. But probably we
> > can
> > >>> >> fix that by adding this info to schema.
> > >>> >> 2) Performance: we have lots optimizations which depend on either
> > >>> >> "stable" object schema, or low number of schemas. We will
> > effectively
> > >>> turn
> > >>> >> them off.
> > >>> >> But what concerns me even more, is that we may end up in enormous
> > >>> number
> > >>> >> of schemas. E.g. consider an object with 10 number fields. If all
> > >>> fields
> > >>> >> could be zero, we may end up in something like 2^10 schemas.
> > >>> >>
> > >>> >> Vladimir.
> > >>> >>
> > >>> >> 29 окт. 2016 г. 0:37 пользователь "Valentin Kulichenko" <
> > >>> >> valentin.kulichenko@gmail.com> написал:
> > >>> >>
> > >>> >> Vova,
> > >>> >>>
> > >>> >>> Why do we need to write zeros and nulls in the first place?
> What's
> > >>> the
> > >>> >>> value of having them in the byte array?
> > >>> >>>
> > >>> >>> -Val
> > >>> >>>
> > >>> >>> On Fri, Oct 28, 2016 at 1:18 AM, Vladimir Ozerov <
> > >>> vozerov@gridgain.com>
> > >>> >>> wrote:
> > >>> >>>
> > >>> >>>> Valya,
> > >>> >>>>
> > >>> >>>> Currently null value is written as one byte, while zero value of
> > >>> long
> > >>> >>>> type is written as 9 bytes. I want to improve that and write
> zeros
> > >>> as one
> > >>> >>>> byte as well.
> > >>> >>>>
> > >>> >>>> As per var-length encoding, I am strongly against it. It saves
> IO
> > >>> and
> > >>> >>>> memory at the cost of CPU. If we encode numbers in this way we
> > will
> > >>> >>>> slowdown SQL (which is already not very fast, to be honest).
> > Because
> > >>> >>>> instead of a single read memory read, we will have to perform
> > >>> multiple
> > >>> >>>> reads and then apply some mechanics to restore original value.
> We
> > >>> already
> > >>> >>>> have such problem with Strings - Java stores them as UTF-16, but
> > we
> > >>> encode
> > >>> >>>> them as UTF-8. As a result every read of a string field in SQL
> > >>> results in
> > >>> >>>> decoding overhead.
> > >>> >>>>
> > >>> >>>> Vladimir.
> > >>> >>>>
> > >>> >>>> On Fri, Oct 28, 2016 at 6:07 AM, Valentin Kulichenko <
> > >>> >>>> valentin.kulichenko@gmail.com> wrote:
> > >>> >>>>
> > >>> >>>>> Cross-posting this to dev list.
> > >>> >>>>>
> > >>> >>>>> Vladimir,
> > >>> >>>>>
> > >>> >>>>> To be honest, I don't see much difference between null values
> for
> > >>> >>>>> objects and zero values for primitives. From BinaryObject
> > semantics
> > >>> >>>>> standpoint, both are default values for corresponding types.
> > These
> > >>> values
> > >>> >>>>> will be returned from the BinaryObject.field() method
> regardless
> > >>> of whether
> > >>> >>>>> we actually save then in the byte array or not. Having said
> that,
> > >>> why don't
> > >>> >>>>> we just skip them during write?
> > >>> >>>>>
> > >>> >>>>> You optimization will be still useful though, because there are
> > >>> often
> > >>> >>>>> a lot of ints and longs that are not zeros, but still small and
> > >>> can fit 1-2
> > >>> >>>>> bytes. We already added such compaction in direct message
> > >>> marshaling and it
> > >>> >>>>> reduced overall traffic by around 30%.
> > >>> >>>>>
> > >>> >>>>> -Val
> > >>> >>>>>
> > >>> >>>>>
> > >>> >>>>> On Thu, Oct 27, 2016 at 2:21 PM, Vladimir Ozerov <
> > >>> vozerov@gridgain.com
> > >>> >>>>> > wrote:
> > >>> >>>>>
> > >>> >>>>>> Hi,
> > >>> >>>>>>
> > >>> >>>>>> I am not very concerned with null fields overhead, because
> > >>> usually it
> > >>> >>>>>> won't be significant. However, there is a problem with zeros.
> > >>> User object
> > >>> >>>>>> might have lots of int/long zeros, this is not uncommon. And
> > each
> > >>> zero will
> > >>> >>>>>> consume 4-8 additional bytes. We probably will implement
> special
> > >>> >>>>>> optimization which will write such fields in special compact
> > >>> format.
> > >>> >>>>>>
> > >>> >>>>>> Vladimir.
> > >>> >>>>>>
> > >>> >>>>>> On Thu, Oct 27, 2016 at 10:55 PM, vkulichenko <
> > >>> >>>>>> valentin.kulichenko@gmail.com> wrote:
> > >>> >>>>>>
> > >>> >>>>>>> Hi,
> > >>> >>>>>>>
> > >>> >>>>>>> Yes, null values consume memory. I believe this can be
> > optimized,
> > >>> >>>>>>> but I
> > >>> >>>>>>> haven't seen issues with this so far. Unless you have
> hundreds
> > of
> > >>> >>>>>>> fields
> > >>> >>>>>>> most of which are nulls (very rare case), the overhead is
> > >>> minimal.
> > >>> >>>>>>>
> > >>> >>>>>>> -Val
> > >>> >>>>>>>
> > >>> >>>>>>>
> > >>> >>>>>>>
> > >>> >>>>>>> --
> > >>> >>>>>>> View this message in context: http://apache-ignite-users.705
> > >>> >>>>>>> 18.x6.nabble.com/BinaryObject-pros-cons-tp8541p8563.html
> > >>> >>>>>>> Sent from the Apache Ignite Users mailing list archive at
> > >>> Nabble.com.
> > >>> >>>>>>>
> > >>> >>>>>>
> > >>> >>>>>>
> > >>> >>>>>
> > >>> >>>>
> > >>> >>>
> > >>> >
> > >>>
> > >>
> > >>
> > >
> >
>

Re: BinaryObject pros/cons

Posted by Vladimir Ozerov <vo...@gridgain.com>.
Igor,

Good catch. Probably some MAX value could help us here.

On Mon, Oct 31, 2016 at 9:17 PM, Igor Sapego <is...@gridgain.com> wrote:

> Valentin,
>
> -1 was just an example. I've checked - currently we use all possible range
> of offset values.
> So if we are going to use suggested approach then we need to reserve some
> value and
> adjust serialization/deserialization algorithms.
>
> Best Regards,
> Igor
>
> On Mon, Oct 31, 2016 at 8:46 PM, Valentin Kulichenko <
> valentin.kulichenko@gmail.com> wrote:
>
> > Makes sense to me, but not sure about -1 in particular. Is this offset
> > relative to object start position? What values can it have?
> >
> > -Val
> >
> > On Mon, Oct 31, 2016 at 10:38 AM, Igor Sapego <is...@gridgain.com>
> > wrote:
> >
> >> Vladimir,
> >>
> >> How about some reserved value? I.e -1 offset means a default/null value
> >> should be used?
> >>
> >> Best Regards,
> >> Igor
> >>
> >> On Mon, Oct 31, 2016 at 5:05 PM, Vladimir Ozerov <vo...@gridgain.com>
> >> wrote:
> >>
> >>> Valya,
> >>>
> >>> Do you have any ideas how to implement this? We write field offsets in
> >>> the
> >>> footer. If field is not written, then what should be used for its
> offset?
> >>>
> >>> On Mon, Oct 31, 2016 at 4:56 PM, Valentin Kulichenko <
> >>> valentin.kulichenko@gmail.com> wrote:
> >>>
> >>> > Vladimir,
> >>> >
> >>> > These are good points, but I'm not suggesting to change the schema.
> If
> >>> one
> >>> > writes five fields, the schema should have five fields in any case,
> >>> > regardless of values. I only suggest to change the internal
> >>> representation
> >>> > of the object and do not save fields with default values in the byte
> >>> array
> >>> > as we don't really need them there.
> >>> >
> >>> > -Val
> >>> >
> >>> > On Sun, Oct 30, 2016 at 12:24 PM, Vladimir Ozerov <
> >>> vozerov@gridgain.com>
> >>> > wrote:
> >>> >
> >>> >> Valya,
> >>> >>
> >>> >> I have several concerns:
> >>> >> 1) Correctness: hasField() will not work properly. But probably we
> can
> >>> >> fix that by adding this info to schema.
> >>> >> 2) Performance: we have lots optimizations which depend on either
> >>> >> "stable" object schema, or low number of schemas. We will
> effectively
> >>> turn
> >>> >> them off.
> >>> >> But what concerns me even more, is that we may end up in enormous
> >>> number
> >>> >> of schemas. E.g. consider an object with 10 number fields. If all
> >>> fields
> >>> >> could be zero, we may end up in something like 2^10 schemas.
> >>> >>
> >>> >> Vladimir.
> >>> >>
> >>> >> 29 окт. 2016 г. 0:37 пользователь "Valentin Kulichenko" <
> >>> >> valentin.kulichenko@gmail.com> написал:
> >>> >>
> >>> >> Vova,
> >>> >>>
> >>> >>> Why do we need to write zeros and nulls in the first place? What's
> >>> the
> >>> >>> value of having them in the byte array?
> >>> >>>
> >>> >>> -Val
> >>> >>>
> >>> >>> On Fri, Oct 28, 2016 at 1:18 AM, Vladimir Ozerov <
> >>> vozerov@gridgain.com>
> >>> >>> wrote:
> >>> >>>
> >>> >>>> Valya,
> >>> >>>>
> >>> >>>> Currently null value is written as one byte, while zero value of
> >>> long
> >>> >>>> type is written as 9 bytes. I want to improve that and write zeros
> >>> as one
> >>> >>>> byte as well.
> >>> >>>>
> >>> >>>> As per var-length encoding, I am strongly against it. It saves IO
> >>> and
> >>> >>>> memory at the cost of CPU. If we encode numbers in this way we
> will
> >>> >>>> slowdown SQL (which is already not very fast, to be honest).
> Because
> >>> >>>> instead of a single read memory read, we will have to perform
> >>> multiple
> >>> >>>> reads and then apply some mechanics to restore original value. We
> >>> already
> >>> >>>> have such problem with Strings - Java stores them as UTF-16, but
> we
> >>> encode
> >>> >>>> them as UTF-8. As a result every read of a string field in SQL
> >>> results in
> >>> >>>> decoding overhead.
> >>> >>>>
> >>> >>>> Vladimir.
> >>> >>>>
> >>> >>>> On Fri, Oct 28, 2016 at 6:07 AM, Valentin Kulichenko <
> >>> >>>> valentin.kulichenko@gmail.com> wrote:
> >>> >>>>
> >>> >>>>> Cross-posting this to dev list.
> >>> >>>>>
> >>> >>>>> Vladimir,
> >>> >>>>>
> >>> >>>>> To be honest, I don't see much difference between null values for
> >>> >>>>> objects and zero values for primitives. From BinaryObject
> semantics
> >>> >>>>> standpoint, both are default values for corresponding types.
> These
> >>> values
> >>> >>>>> will be returned from the BinaryObject.field() method regardless
> >>> of whether
> >>> >>>>> we actually save then in the byte array or not. Having said that,
> >>> why don't
> >>> >>>>> we just skip them during write?
> >>> >>>>>
> >>> >>>>> You optimization will be still useful though, because there are
> >>> often
> >>> >>>>> a lot of ints and longs that are not zeros, but still small and
> >>> can fit 1-2
> >>> >>>>> bytes. We already added such compaction in direct message
> >>> marshaling and it
> >>> >>>>> reduced overall traffic by around 30%.
> >>> >>>>>
> >>> >>>>> -Val
> >>> >>>>>
> >>> >>>>>
> >>> >>>>> On Thu, Oct 27, 2016 at 2:21 PM, Vladimir Ozerov <
> >>> vozerov@gridgain.com
> >>> >>>>> > wrote:
> >>> >>>>>
> >>> >>>>>> Hi,
> >>> >>>>>>
> >>> >>>>>> I am not very concerned with null fields overhead, because
> >>> usually it
> >>> >>>>>> won't be significant. However, there is a problem with zeros.
> >>> User object
> >>> >>>>>> might have lots of int/long zeros, this is not uncommon. And
> each
> >>> zero will
> >>> >>>>>> consume 4-8 additional bytes. We probably will implement special
> >>> >>>>>> optimization which will write such fields in special compact
> >>> format.
> >>> >>>>>>
> >>> >>>>>> Vladimir.
> >>> >>>>>>
> >>> >>>>>> On Thu, Oct 27, 2016 at 10:55 PM, vkulichenko <
> >>> >>>>>> valentin.kulichenko@gmail.com> wrote:
> >>> >>>>>>
> >>> >>>>>>> Hi,
> >>> >>>>>>>
> >>> >>>>>>> Yes, null values consume memory. I believe this can be
> optimized,
> >>> >>>>>>> but I
> >>> >>>>>>> haven't seen issues with this so far. Unless you have hundreds
> of
> >>> >>>>>>> fields
> >>> >>>>>>> most of which are nulls (very rare case), the overhead is
> >>> minimal.
> >>> >>>>>>>
> >>> >>>>>>> -Val
> >>> >>>>>>>
> >>> >>>>>>>
> >>> >>>>>>>
> >>> >>>>>>> --
> >>> >>>>>>> View this message in context: http://apache-ignite-users.705
> >>> >>>>>>> 18.x6.nabble.com/BinaryObject-pros-cons-tp8541p8563.html
> >>> >>>>>>> Sent from the Apache Ignite Users mailing list archive at
> >>> Nabble.com.
> >>> >>>>>>>
> >>> >>>>>>
> >>> >>>>>>
> >>> >>>>>
> >>> >>>>
> >>> >>>
> >>> >
> >>>
> >>
> >>
> >
>

Re: BinaryObject pros/cons

Posted by Igor Sapego <is...@gridgain.com>.
Valentin,

-1 was just an example. I've checked - currently we use all possible range
of offset values.
So if we are going to use suggested approach then we need to reserve some
value and
adjust serialization/deserialization algorithms.

Best Regards,
Igor

On Mon, Oct 31, 2016 at 8:46 PM, Valentin Kulichenko <
valentin.kulichenko@gmail.com> wrote:

> Makes sense to me, but not sure about -1 in particular. Is this offset
> relative to object start position? What values can it have?
>
> -Val
>
> On Mon, Oct 31, 2016 at 10:38 AM, Igor Sapego <is...@gridgain.com>
> wrote:
>
>> Vladimir,
>>
>> How about some reserved value? I.e -1 offset means a default/null value
>> should be used?
>>
>> Best Regards,
>> Igor
>>
>> On Mon, Oct 31, 2016 at 5:05 PM, Vladimir Ozerov <vo...@gridgain.com>
>> wrote:
>>
>>> Valya,
>>>
>>> Do you have any ideas how to implement this? We write field offsets in
>>> the
>>> footer. If field is not written, then what should be used for its offset?
>>>
>>> On Mon, Oct 31, 2016 at 4:56 PM, Valentin Kulichenko <
>>> valentin.kulichenko@gmail.com> wrote:
>>>
>>> > Vladimir,
>>> >
>>> > These are good points, but I'm not suggesting to change the schema. If
>>> one
>>> > writes five fields, the schema should have five fields in any case,
>>> > regardless of values. I only suggest to change the internal
>>> representation
>>> > of the object and do not save fields with default values in the byte
>>> array
>>> > as we don't really need them there.
>>> >
>>> > -Val
>>> >
>>> > On Sun, Oct 30, 2016 at 12:24 PM, Vladimir Ozerov <
>>> vozerov@gridgain.com>
>>> > wrote:
>>> >
>>> >> Valya,
>>> >>
>>> >> I have several concerns:
>>> >> 1) Correctness: hasField() will not work properly. But probably we can
>>> >> fix that by adding this info to schema.
>>> >> 2) Performance: we have lots optimizations which depend on either
>>> >> "stable" object schema, or low number of schemas. We will effectively
>>> turn
>>> >> them off.
>>> >> But what concerns me even more, is that we may end up in enormous
>>> number
>>> >> of schemas. E.g. consider an object with 10 number fields. If all
>>> fields
>>> >> could be zero, we may end up in something like 2^10 schemas.
>>> >>
>>> >> Vladimir.
>>> >>
>>> >> 29 окт. 2016 г. 0:37 пользователь "Valentin Kulichenko" <
>>> >> valentin.kulichenko@gmail.com> написал:
>>> >>
>>> >> Vova,
>>> >>>
>>> >>> Why do we need to write zeros and nulls in the first place? What's
>>> the
>>> >>> value of having them in the byte array?
>>> >>>
>>> >>> -Val
>>> >>>
>>> >>> On Fri, Oct 28, 2016 at 1:18 AM, Vladimir Ozerov <
>>> vozerov@gridgain.com>
>>> >>> wrote:
>>> >>>
>>> >>>> Valya,
>>> >>>>
>>> >>>> Currently null value is written as one byte, while zero value of
>>> long
>>> >>>> type is written as 9 bytes. I want to improve that and write zeros
>>> as one
>>> >>>> byte as well.
>>> >>>>
>>> >>>> As per var-length encoding, I am strongly against it. It saves IO
>>> and
>>> >>>> memory at the cost of CPU. If we encode numbers in this way we will
>>> >>>> slowdown SQL (which is already not very fast, to be honest). Because
>>> >>>> instead of a single read memory read, we will have to perform
>>> multiple
>>> >>>> reads and then apply some mechanics to restore original value. We
>>> already
>>> >>>> have such problem with Strings - Java stores them as UTF-16, but we
>>> encode
>>> >>>> them as UTF-8. As a result every read of a string field in SQL
>>> results in
>>> >>>> decoding overhead.
>>> >>>>
>>> >>>> Vladimir.
>>> >>>>
>>> >>>> On Fri, Oct 28, 2016 at 6:07 AM, Valentin Kulichenko <
>>> >>>> valentin.kulichenko@gmail.com> wrote:
>>> >>>>
>>> >>>>> Cross-posting this to dev list.
>>> >>>>>
>>> >>>>> Vladimir,
>>> >>>>>
>>> >>>>> To be honest, I don't see much difference between null values for
>>> >>>>> objects and zero values for primitives. From BinaryObject semantics
>>> >>>>> standpoint, both are default values for corresponding types. These
>>> values
>>> >>>>> will be returned from the BinaryObject.field() method regardless
>>> of whether
>>> >>>>> we actually save then in the byte array or not. Having said that,
>>> why don't
>>> >>>>> we just skip them during write?
>>> >>>>>
>>> >>>>> You optimization will be still useful though, because there are
>>> often
>>> >>>>> a lot of ints and longs that are not zeros, but still small and
>>> can fit 1-2
>>> >>>>> bytes. We already added such compaction in direct message
>>> marshaling and it
>>> >>>>> reduced overall traffic by around 30%.
>>> >>>>>
>>> >>>>> -Val
>>> >>>>>
>>> >>>>>
>>> >>>>> On Thu, Oct 27, 2016 at 2:21 PM, Vladimir Ozerov <
>>> vozerov@gridgain.com
>>> >>>>> > wrote:
>>> >>>>>
>>> >>>>>> Hi,
>>> >>>>>>
>>> >>>>>> I am not very concerned with null fields overhead, because
>>> usually it
>>> >>>>>> won't be significant. However, there is a problem with zeros.
>>> User object
>>> >>>>>> might have lots of int/long zeros, this is not uncommon. And each
>>> zero will
>>> >>>>>> consume 4-8 additional bytes. We probably will implement special
>>> >>>>>> optimization which will write such fields in special compact
>>> format.
>>> >>>>>>
>>> >>>>>> Vladimir.
>>> >>>>>>
>>> >>>>>> On Thu, Oct 27, 2016 at 10:55 PM, vkulichenko <
>>> >>>>>> valentin.kulichenko@gmail.com> wrote:
>>> >>>>>>
>>> >>>>>>> Hi,
>>> >>>>>>>
>>> >>>>>>> Yes, null values consume memory. I believe this can be optimized,
>>> >>>>>>> but I
>>> >>>>>>> haven't seen issues with this so far. Unless you have hundreds of
>>> >>>>>>> fields
>>> >>>>>>> most of which are nulls (very rare case), the overhead is
>>> minimal.
>>> >>>>>>>
>>> >>>>>>> -Val
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>> --
>>> >>>>>>> View this message in context: http://apache-ignite-users.705
>>> >>>>>>> 18.x6.nabble.com/BinaryObject-pros-cons-tp8541p8563.html
>>> >>>>>>> Sent from the Apache Ignite Users mailing list archive at
>>> Nabble.com.
>>> >>>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>
>>> >>>>
>>> >>>
>>> >
>>>
>>
>>
>

Re: BinaryObject pros/cons

Posted by Valentin Kulichenko <va...@gmail.com>.
Makes sense to me, but not sure about -1 in particular. Is this offset
relative to object start position? What values can it have?

-Val

On Mon, Oct 31, 2016 at 10:38 AM, Igor Sapego <is...@gridgain.com> wrote:

> Vladimir,
>
> How about some reserved value? I.e -1 offset means a default/null value
> should be used?
>
> Best Regards,
> Igor
>
> On Mon, Oct 31, 2016 at 5:05 PM, Vladimir Ozerov <vo...@gridgain.com>
> wrote:
>
>> Valya,
>>
>> Do you have any ideas how to implement this? We write field offsets in the
>> footer. If field is not written, then what should be used for its offset?
>>
>> On Mon, Oct 31, 2016 at 4:56 PM, Valentin Kulichenko <
>> valentin.kulichenko@gmail.com> wrote:
>>
>> > Vladimir,
>> >
>> > These are good points, but I'm not suggesting to change the schema. If
>> one
>> > writes five fields, the schema should have five fields in any case,
>> > regardless of values. I only suggest to change the internal
>> representation
>> > of the object and do not save fields with default values in the byte
>> array
>> > as we don't really need them there.
>> >
>> > -Val
>> >
>> > On Sun, Oct 30, 2016 at 12:24 PM, Vladimir Ozerov <vozerov@gridgain.com
>> >
>> > wrote:
>> >
>> >> Valya,
>> >>
>> >> I have several concerns:
>> >> 1) Correctness: hasField() will not work properly. But probably we can
>> >> fix that by adding this info to schema.
>> >> 2) Performance: we have lots optimizations which depend on either
>> >> "stable" object schema, or low number of schemas. We will effectively
>> turn
>> >> them off.
>> >> But what concerns me even more, is that we may end up in enormous
>> number
>> >> of schemas. E.g. consider an object with 10 number fields. If all
>> fields
>> >> could be zero, we may end up in something like 2^10 schemas.
>> >>
>> >> Vladimir.
>> >>
>> >> 29 окт. 2016 г. 0:37 пользователь "Valentin Kulichenko" <
>> >> valentin.kulichenko@gmail.com> написал:
>> >>
>> >> Vova,
>> >>>
>> >>> Why do we need to write zeros and nulls in the first place? What's the
>> >>> value of having them in the byte array?
>> >>>
>> >>> -Val
>> >>>
>> >>> On Fri, Oct 28, 2016 at 1:18 AM, Vladimir Ozerov <
>> vozerov@gridgain.com>
>> >>> wrote:
>> >>>
>> >>>> Valya,
>> >>>>
>> >>>> Currently null value is written as one byte, while zero value of long
>> >>>> type is written as 9 bytes. I want to improve that and write zeros
>> as one
>> >>>> byte as well.
>> >>>>
>> >>>> As per var-length encoding, I am strongly against it. It saves IO and
>> >>>> memory at the cost of CPU. If we encode numbers in this way we will
>> >>>> slowdown SQL (which is already not very fast, to be honest). Because
>> >>>> instead of a single read memory read, we will have to perform
>> multiple
>> >>>> reads and then apply some mechanics to restore original value. We
>> already
>> >>>> have such problem with Strings - Java stores them as UTF-16, but we
>> encode
>> >>>> them as UTF-8. As a result every read of a string field in SQL
>> results in
>> >>>> decoding overhead.
>> >>>>
>> >>>> Vladimir.
>> >>>>
>> >>>> On Fri, Oct 28, 2016 at 6:07 AM, Valentin Kulichenko <
>> >>>> valentin.kulichenko@gmail.com> wrote:
>> >>>>
>> >>>>> Cross-posting this to dev list.
>> >>>>>
>> >>>>> Vladimir,
>> >>>>>
>> >>>>> To be honest, I don't see much difference between null values for
>> >>>>> objects and zero values for primitives. From BinaryObject semantics
>> >>>>> standpoint, both are default values for corresponding types. These
>> values
>> >>>>> will be returned from the BinaryObject.field() method regardless of
>> whether
>> >>>>> we actually save then in the byte array or not. Having said that,
>> why don't
>> >>>>> we just skip them during write?
>> >>>>>
>> >>>>> You optimization will be still useful though, because there are
>> often
>> >>>>> a lot of ints and longs that are not zeros, but still small and can
>> fit 1-2
>> >>>>> bytes. We already added such compaction in direct message
>> marshaling and it
>> >>>>> reduced overall traffic by around 30%.
>> >>>>>
>> >>>>> -Val
>> >>>>>
>> >>>>>
>> >>>>> On Thu, Oct 27, 2016 at 2:21 PM, Vladimir Ozerov <
>> vozerov@gridgain.com
>> >>>>> > wrote:
>> >>>>>
>> >>>>>> Hi,
>> >>>>>>
>> >>>>>> I am not very concerned with null fields overhead, because usually
>> it
>> >>>>>> won't be significant. However, there is a problem with zeros. User
>> object
>> >>>>>> might have lots of int/long zeros, this is not uncommon. And each
>> zero will
>> >>>>>> consume 4-8 additional bytes. We probably will implement special
>> >>>>>> optimization which will write such fields in special compact
>> format.
>> >>>>>>
>> >>>>>> Vladimir.
>> >>>>>>
>> >>>>>> On Thu, Oct 27, 2016 at 10:55 PM, vkulichenko <
>> >>>>>> valentin.kulichenko@gmail.com> wrote:
>> >>>>>>
>> >>>>>>> Hi,
>> >>>>>>>
>> >>>>>>> Yes, null values consume memory. I believe this can be optimized,
>> >>>>>>> but I
>> >>>>>>> haven't seen issues with this so far. Unless you have hundreds of
>> >>>>>>> fields
>> >>>>>>> most of which are nulls (very rare case), the overhead is minimal.
>> >>>>>>>
>> >>>>>>> -Val
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> --
>> >>>>>>> View this message in context: http://apache-ignite-users.705
>> >>>>>>> 18.x6.nabble.com/BinaryObject-pros-cons-tp8541p8563.html
>> >>>>>>> Sent from the Apache Ignite Users mailing list archive at
>> Nabble.com.
>> >>>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>
>> >>>>
>> >>>
>> >
>>
>
>

Re: BinaryObject pros/cons

Posted by Igor Sapego <is...@gridgain.com>.
Vladimir,

How about some reserved value? I.e -1 offset means a default/null value
should be used?

Best Regards,
Igor

On Mon, Oct 31, 2016 at 5:05 PM, Vladimir Ozerov <vo...@gridgain.com>
wrote:

> Valya,
>
> Do you have any ideas how to implement this? We write field offsets in the
> footer. If field is not written, then what should be used for its offset?
>
> On Mon, Oct 31, 2016 at 4:56 PM, Valentin Kulichenko <
> valentin.kulichenko@gmail.com> wrote:
>
> > Vladimir,
> >
> > These are good points, but I'm not suggesting to change the schema. If
> one
> > writes five fields, the schema should have five fields in any case,
> > regardless of values. I only suggest to change the internal
> representation
> > of the object and do not save fields with default values in the byte
> array
> > as we don't really need them there.
> >
> > -Val
> >
> > On Sun, Oct 30, 2016 at 12:24 PM, Vladimir Ozerov <vo...@gridgain.com>
> > wrote:
> >
> >> Valya,
> >>
> >> I have several concerns:
> >> 1) Correctness: hasField() will not work properly. But probably we can
> >> fix that by adding this info to schema.
> >> 2) Performance: we have lots optimizations which depend on either
> >> "stable" object schema, or low number of schemas. We will effectively
> turn
> >> them off.
> >> But what concerns me even more, is that we may end up in enormous number
> >> of schemas. E.g. consider an object with 10 number fields. If all fields
> >> could be zero, we may end up in something like 2^10 schemas.
> >>
> >> Vladimir.
> >>
> >> 29 окт. 2016 г. 0:37 пользователь "Valentin Kulichenko" <
> >> valentin.kulichenko@gmail.com> написал:
> >>
> >> Vova,
> >>>
> >>> Why do we need to write zeros and nulls in the first place? What's the
> >>> value of having them in the byte array?
> >>>
> >>> -Val
> >>>
> >>> On Fri, Oct 28, 2016 at 1:18 AM, Vladimir Ozerov <vozerov@gridgain.com
> >
> >>> wrote:
> >>>
> >>>> Valya,
> >>>>
> >>>> Currently null value is written as one byte, while zero value of long
> >>>> type is written as 9 bytes. I want to improve that and write zeros as
> one
> >>>> byte as well.
> >>>>
> >>>> As per var-length encoding, I am strongly against it. It saves IO and
> >>>> memory at the cost of CPU. If we encode numbers in this way we will
> >>>> slowdown SQL (which is already not very fast, to be honest). Because
> >>>> instead of a single read memory read, we will have to perform multiple
> >>>> reads and then apply some mechanics to restore original value. We
> already
> >>>> have such problem with Strings - Java stores them as UTF-16, but we
> encode
> >>>> them as UTF-8. As a result every read of a string field in SQL
> results in
> >>>> decoding overhead.
> >>>>
> >>>> Vladimir.
> >>>>
> >>>> On Fri, Oct 28, 2016 at 6:07 AM, Valentin Kulichenko <
> >>>> valentin.kulichenko@gmail.com> wrote:
> >>>>
> >>>>> Cross-posting this to dev list.
> >>>>>
> >>>>> Vladimir,
> >>>>>
> >>>>> To be honest, I don't see much difference between null values for
> >>>>> objects and zero values for primitives. From BinaryObject semantics
> >>>>> standpoint, both are default values for corresponding types. These
> values
> >>>>> will be returned from the BinaryObject.field() method regardless of
> whether
> >>>>> we actually save then in the byte array or not. Having said that,
> why don't
> >>>>> we just skip them during write?
> >>>>>
> >>>>> You optimization will be still useful though, because there are often
> >>>>> a lot of ints and longs that are not zeros, but still small and can
> fit 1-2
> >>>>> bytes. We already added such compaction in direct message marshaling
> and it
> >>>>> reduced overall traffic by around 30%.
> >>>>>
> >>>>> -Val
> >>>>>
> >>>>>
> >>>>> On Thu, Oct 27, 2016 at 2:21 PM, Vladimir Ozerov <
> vozerov@gridgain.com
> >>>>> > wrote:
> >>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> I am not very concerned with null fields overhead, because usually
> it
> >>>>>> won't be significant. However, there is a problem with zeros. User
> object
> >>>>>> might have lots of int/long zeros, this is not uncommon. And each
> zero will
> >>>>>> consume 4-8 additional bytes. We probably will implement special
> >>>>>> optimization which will write such fields in special compact format.
> >>>>>>
> >>>>>> Vladimir.
> >>>>>>
> >>>>>> On Thu, Oct 27, 2016 at 10:55 PM, vkulichenko <
> >>>>>> valentin.kulichenko@gmail.com> wrote:
> >>>>>>
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> Yes, null values consume memory. I believe this can be optimized,
> >>>>>>> but I
> >>>>>>> haven't seen issues with this so far. Unless you have hundreds of
> >>>>>>> fields
> >>>>>>> most of which are nulls (very rare case), the overhead is minimal.
> >>>>>>>
> >>>>>>> -Val
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> View this message in context: http://apache-ignite-users.705
> >>>>>>> 18.x6.nabble.com/BinaryObject-pros-cons-tp8541p8563.html
> >>>>>>> Sent from the Apache Ignite Users mailing list archive at
> Nabble.com.
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >
>

Re: BinaryObject pros/cons

Posted by Vladimir Ozerov <vo...@gridgain.com>.
Valya,

Do you have any ideas how to implement this? We write field offsets in the
footer. If field is not written, then what should be used for its offset?

On Mon, Oct 31, 2016 at 4:56 PM, Valentin Kulichenko <
valentin.kulichenko@gmail.com> wrote:

> Vladimir,
>
> These are good points, but I'm not suggesting to change the schema. If one
> writes five fields, the schema should have five fields in any case,
> regardless of values. I only suggest to change the internal representation
> of the object and do not save fields with default values in the byte array
> as we don't really need them there.
>
> -Val
>
> On Sun, Oct 30, 2016 at 12:24 PM, Vladimir Ozerov <vo...@gridgain.com>
> wrote:
>
>> Valya,
>>
>> I have several concerns:
>> 1) Correctness: hasField() will not work properly. But probably we can
>> fix that by adding this info to schema.
>> 2) Performance: we have lots optimizations which depend on either
>> "stable" object schema, or low number of schemas. We will effectively turn
>> them off.
>> But what concerns me even more, is that we may end up in enormous number
>> of schemas. E.g. consider an object with 10 number fields. If all fields
>> could be zero, we may end up in something like 2^10 schemas.
>>
>> Vladimir.
>>
>> 29 окт. 2016 г. 0:37 пользователь "Valentin Kulichenko" <
>> valentin.kulichenko@gmail.com> написал:
>>
>> Vova,
>>>
>>> Why do we need to write zeros and nulls in the first place? What's the
>>> value of having them in the byte array?
>>>
>>> -Val
>>>
>>> On Fri, Oct 28, 2016 at 1:18 AM, Vladimir Ozerov <vo...@gridgain.com>
>>> wrote:
>>>
>>>> Valya,
>>>>
>>>> Currently null value is written as one byte, while zero value of long
>>>> type is written as 9 bytes. I want to improve that and write zeros as one
>>>> byte as well.
>>>>
>>>> As per var-length encoding, I am strongly against it. It saves IO and
>>>> memory at the cost of CPU. If we encode numbers in this way we will
>>>> slowdown SQL (which is already not very fast, to be honest). Because
>>>> instead of a single read memory read, we will have to perform multiple
>>>> reads and then apply some mechanics to restore original value. We already
>>>> have such problem with Strings - Java stores them as UTF-16, but we encode
>>>> them as UTF-8. As a result every read of a string field in SQL results in
>>>> decoding overhead.
>>>>
>>>> Vladimir.
>>>>
>>>> On Fri, Oct 28, 2016 at 6:07 AM, Valentin Kulichenko <
>>>> valentin.kulichenko@gmail.com> wrote:
>>>>
>>>>> Cross-posting this to dev list.
>>>>>
>>>>> Vladimir,
>>>>>
>>>>> To be honest, I don't see much difference between null values for
>>>>> objects and zero values for primitives. From BinaryObject semantics
>>>>> standpoint, both are default values for corresponding types. These values
>>>>> will be returned from the BinaryObject.field() method regardless of whether
>>>>> we actually save then in the byte array or not. Having said that, why don't
>>>>> we just skip them during write?
>>>>>
>>>>> You optimization will be still useful though, because there are often
>>>>> a lot of ints and longs that are not zeros, but still small and can fit 1-2
>>>>> bytes. We already added such compaction in direct message marshaling and it
>>>>> reduced overall traffic by around 30%.
>>>>>
>>>>> -Val
>>>>>
>>>>>
>>>>> On Thu, Oct 27, 2016 at 2:21 PM, Vladimir Ozerov <vozerov@gridgain.com
>>>>> > wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I am not very concerned with null fields overhead, because usually it
>>>>>> won't be significant. However, there is a problem with zeros. User object
>>>>>> might have lots of int/long zeros, this is not uncommon. And each zero will
>>>>>> consume 4-8 additional bytes. We probably will implement special
>>>>>> optimization which will write such fields in special compact format.
>>>>>>
>>>>>> Vladimir.
>>>>>>
>>>>>> On Thu, Oct 27, 2016 at 10:55 PM, vkulichenko <
>>>>>> valentin.kulichenko@gmail.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Yes, null values consume memory. I believe this can be optimized,
>>>>>>> but I
>>>>>>> haven't seen issues with this so far. Unless you have hundreds of
>>>>>>> fields
>>>>>>> most of which are nulls (very rare case), the overhead is minimal.
>>>>>>>
>>>>>>> -Val
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> View this message in context: http://apache-ignite-users.705
>>>>>>> 18.x6.nabble.com/BinaryObject-pros-cons-tp8541p8563.html
>>>>>>> Sent from the Apache Ignite Users mailing list archive at Nabble.com.
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>

Re: BinaryObject pros/cons

Posted by Valentin Kulichenko <va...@gmail.com>.
Vladimir,

These are good points, but I'm not suggesting to change the schema. If one
writes five fields, the schema should have five fields in any case,
regardless of values. I only suggest to change the internal representation
of the object and do not save fields with default values in the byte array
as we don't really need them there.

-Val

On Sun, Oct 30, 2016 at 12:24 PM, Vladimir Ozerov <vo...@gridgain.com>
wrote:

> Valya,
>
> I have several concerns:
> 1) Correctness: hasField() will not work properly. But probably we can fix
> that by adding this info to schema.
> 2) Performance: we have lots optimizations which depend on either "stable"
> object schema, or low number of schemas. We will effectively turn them off.
> But what concerns me even more, is that we may end up in enormous number
> of schemas. E.g. consider an object with 10 number fields. If all fields
> could be zero, we may end up in something like 2^10 schemas.
>
> Vladimir.
>
> 29 окт. 2016 г. 0:37 пользователь "Valentin Kulichenko" <
> valentin.kulichenko@gmail.com> написал:
>
> Vova,
>>
>> Why do we need to write zeros and nulls in the first place? What's the
>> value of having them in the byte array?
>>
>> -Val
>>
>> On Fri, Oct 28, 2016 at 1:18 AM, Vladimir Ozerov <vo...@gridgain.com>
>> wrote:
>>
>>> Valya,
>>>
>>> Currently null value is written as one byte, while zero value of long
>>> type is written as 9 bytes. I want to improve that and write zeros as one
>>> byte as well.
>>>
>>> As per var-length encoding, I am strongly against it. It saves IO and
>>> memory at the cost of CPU. If we encode numbers in this way we will
>>> slowdown SQL (which is already not very fast, to be honest). Because
>>> instead of a single read memory read, we will have to perform multiple
>>> reads and then apply some mechanics to restore original value. We already
>>> have such problem with Strings - Java stores them as UTF-16, but we encode
>>> them as UTF-8. As a result every read of a string field in SQL results in
>>> decoding overhead.
>>>
>>> Vladimir.
>>>
>>> On Fri, Oct 28, 2016 at 6:07 AM, Valentin Kulichenko <
>>> valentin.kulichenko@gmail.com> wrote:
>>>
>>>> Cross-posting this to dev list.
>>>>
>>>> Vladimir,
>>>>
>>>> To be honest, I don't see much difference between null values for
>>>> objects and zero values for primitives. From BinaryObject semantics
>>>> standpoint, both are default values for corresponding types. These values
>>>> will be returned from the BinaryObject.field() method regardless of whether
>>>> we actually save then in the byte array or not. Having said that, why don't
>>>> we just skip them during write?
>>>>
>>>> You optimization will be still useful though, because there are often a
>>>> lot of ints and longs that are not zeros, but still small and can fit 1-2
>>>> bytes. We already added such compaction in direct message marshaling and it
>>>> reduced overall traffic by around 30%.
>>>>
>>>> -Val
>>>>
>>>>
>>>> On Thu, Oct 27, 2016 at 2:21 PM, Vladimir Ozerov <vo...@gridgain.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I am not very concerned with null fields overhead, because usually it
>>>>> won't be significant. However, there is a problem with zeros. User object
>>>>> might have lots of int/long zeros, this is not uncommon. And each zero will
>>>>> consume 4-8 additional bytes. We probably will implement special
>>>>> optimization which will write such fields in special compact format.
>>>>>
>>>>> Vladimir.
>>>>>
>>>>> On Thu, Oct 27, 2016 at 10:55 PM, vkulichenko <
>>>>> valentin.kulichenko@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Yes, null values consume memory. I believe this can be optimized, but
>>>>>> I
>>>>>> haven't seen issues with this so far. Unless you have hundreds of
>>>>>> fields
>>>>>> most of which are nulls (very rare case), the overhead is minimal.
>>>>>>
>>>>>> -Val
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> View this message in context: http://apache-ignite-users.705
>>>>>> 18.x6.nabble.com/BinaryObject-pros-cons-tp8541p8563.html
>>>>>> Sent from the Apache Ignite Users mailing list archive at Nabble.com.
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>

Re: BinaryObject pros/cons

Posted by Vladimir Ozerov <vo...@gridgain.com>.
Valya,

I have several concerns:
1) Correctness: hasField() will not work properly. But probably we can fix
that by adding this info to schema.
2) Performance: we have lots optimizations which depend on either "stable"
object schema, or low number of schemas. We will effectively turn them off.
But what concerns me even more, is that we may end up in enormous number of
schemas. E.g. consider an object with 10 number fields. If all fields could
be zero, we may end up in something like 2^10 schemas.

Vladimir.

29 окт. 2016 г. 0:37 пользователь "Valentin Kulichenko" <
valentin.kulichenko@gmail.com> написал:

> Vova,
>
> Why do we need to write zeros and nulls in the first place? What's the
> value of having them in the byte array?
>
> -Val
>
> On Fri, Oct 28, 2016 at 1:18 AM, Vladimir Ozerov <vo...@gridgain.com>
> wrote:
>
>> Valya,
>>
>> Currently null value is written as one byte, while zero value of long
>> type is written as 9 bytes. I want to improve that and write zeros as one
>> byte as well.
>>
>> As per var-length encoding, I am strongly against it. It saves IO and
>> memory at the cost of CPU. If we encode numbers in this way we will
>> slowdown SQL (which is already not very fast, to be honest). Because
>> instead of a single read memory read, we will have to perform multiple
>> reads and then apply some mechanics to restore original value. We already
>> have such problem with Strings - Java stores them as UTF-16, but we encode
>> them as UTF-8. As a result every read of a string field in SQL results in
>> decoding overhead.
>>
>> Vladimir.
>>
>> On Fri, Oct 28, 2016 at 6:07 AM, Valentin Kulichenko <
>> valentin.kulichenko@gmail.com> wrote:
>>
>>> Cross-posting this to dev list.
>>>
>>> Vladimir,
>>>
>>> To be honest, I don't see much difference between null values for
>>> objects and zero values for primitives. From BinaryObject semantics
>>> standpoint, both are default values for corresponding types. These values
>>> will be returned from the BinaryObject.field() method regardless of whether
>>> we actually save then in the byte array or not. Having said that, why don't
>>> we just skip them during write?
>>>
>>> You optimization will be still useful though, because there are often a
>>> lot of ints and longs that are not zeros, but still small and can fit 1-2
>>> bytes. We already added such compaction in direct message marshaling and it
>>> reduced overall traffic by around 30%.
>>>
>>> -Val
>>>
>>>
>>> On Thu, Oct 27, 2016 at 2:21 PM, Vladimir Ozerov <vo...@gridgain.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am not very concerned with null fields overhead, because usually it
>>>> won't be significant. However, there is a problem with zeros. User object
>>>> might have lots of int/long zeros, this is not uncommon. And each zero will
>>>> consume 4-8 additional bytes. We probably will implement special
>>>> optimization which will write such fields in special compact format.
>>>>
>>>> Vladimir.
>>>>
>>>> On Thu, Oct 27, 2016 at 10:55 PM, vkulichenko <
>>>> valentin.kulichenko@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Yes, null values consume memory. I believe this can be optimized, but I
>>>>> haven't seen issues with this so far. Unless you have hundreds of
>>>>> fields
>>>>> most of which are nulls (very rare case), the overhead is minimal.
>>>>>
>>>>> -Val
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> View this message in context: http://apache-ignite-users.705
>>>>> 18.x6.nabble.com/BinaryObject-pros-cons-tp8541p8563.html
>>>>> Sent from the Apache Ignite Users mailing list archive at Nabble.com.
>>>>>
>>>>
>>>>
>>>
>>
>

Re: BinaryObject pros/cons

Posted by Vladimir Ozerov <vo...@gridgain.com>.
Valya,

I have several concerns:
1) Correctness: hasField() will not work properly. But probably we can fix
that by adding this info to schema.
2) Performance: we have lots optimizations which depend on either "stable"
object schema, or low number of schemas. We will effectively turn them off.
But what concerns me even more, is that we may end up in enormous number of
schemas. E.g. consider an object with 10 number fields. If all fields could
be zero, we may end up in something like 2^10 schemas.

Vladimir.

29 окт. 2016 г. 0:37 пользователь "Valentin Kulichenko" <
valentin.kulichenko@gmail.com> написал:

> Vova,
>
> Why do we need to write zeros and nulls in the first place? What's the
> value of having them in the byte array?
>
> -Val
>
> On Fri, Oct 28, 2016 at 1:18 AM, Vladimir Ozerov <vo...@gridgain.com>
> wrote:
>
>> Valya,
>>
>> Currently null value is written as one byte, while zero value of long
>> type is written as 9 bytes. I want to improve that and write zeros as one
>> byte as well.
>>
>> As per var-length encoding, I am strongly against it. It saves IO and
>> memory at the cost of CPU. If we encode numbers in this way we will
>> slowdown SQL (which is already not very fast, to be honest). Because
>> instead of a single read memory read, we will have to perform multiple
>> reads and then apply some mechanics to restore original value. We already
>> have such problem with Strings - Java stores them as UTF-16, but we encode
>> them as UTF-8. As a result every read of a string field in SQL results in
>> decoding overhead.
>>
>> Vladimir.
>>
>> On Fri, Oct 28, 2016 at 6:07 AM, Valentin Kulichenko <
>> valentin.kulichenko@gmail.com> wrote:
>>
>>> Cross-posting this to dev list.
>>>
>>> Vladimir,
>>>
>>> To be honest, I don't see much difference between null values for
>>> objects and zero values for primitives. From BinaryObject semantics
>>> standpoint, both are default values for corresponding types. These values
>>> will be returned from the BinaryObject.field() method regardless of whether
>>> we actually save then in the byte array or not. Having said that, why don't
>>> we just skip them during write?
>>>
>>> You optimization will be still useful though, because there are often a
>>> lot of ints and longs that are not zeros, but still small and can fit 1-2
>>> bytes. We already added such compaction in direct message marshaling and it
>>> reduced overall traffic by around 30%.
>>>
>>> -Val
>>>
>>>
>>> On Thu, Oct 27, 2016 at 2:21 PM, Vladimir Ozerov <vo...@gridgain.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am not very concerned with null fields overhead, because usually it
>>>> won't be significant. However, there is a problem with zeros. User object
>>>> might have lots of int/long zeros, this is not uncommon. And each zero will
>>>> consume 4-8 additional bytes. We probably will implement special
>>>> optimization which will write such fields in special compact format.
>>>>
>>>> Vladimir.
>>>>
>>>> On Thu, Oct 27, 2016 at 10:55 PM, vkulichenko <
>>>> valentin.kulichenko@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Yes, null values consume memory. I believe this can be optimized, but I
>>>>> haven't seen issues with this so far. Unless you have hundreds of
>>>>> fields
>>>>> most of which are nulls (very rare case), the overhead is minimal.
>>>>>
>>>>> -Val
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> View this message in context: http://apache-ignite-users.705
>>>>> 18.x6.nabble.com/BinaryObject-pros-cons-tp8541p8563.html
>>>>> Sent from the Apache Ignite Users mailing list archive at Nabble.com.
>>>>>
>>>>
>>>>
>>>
>>
>

Re: BinaryObject pros/cons

Posted by Valentin Kulichenko <va...@gmail.com>.
Vova,

Why do we need to write zeros and nulls in the first place? What's the
value of having them in the byte array?

-Val

On Fri, Oct 28, 2016 at 1:18 AM, Vladimir Ozerov <vo...@gridgain.com>
wrote:

> Valya,
>
> Currently null value is written as one byte, while zero value of long type
> is written as 9 bytes. I want to improve that and write zeros as one byte
> as well.
>
> As per var-length encoding, I am strongly against it. It saves IO and
> memory at the cost of CPU. If we encode numbers in this way we will
> slowdown SQL (which is already not very fast, to be honest). Because
> instead of a single read memory read, we will have to perform multiple
> reads and then apply some mechanics to restore original value. We already
> have such problem with Strings - Java stores them as UTF-16, but we encode
> them as UTF-8. As a result every read of a string field in SQL results in
> decoding overhead.
>
> Vladimir.
>
> On Fri, Oct 28, 2016 at 6:07 AM, Valentin Kulichenko <
> valentin.kulichenko@gmail.com> wrote:
>
>> Cross-posting this to dev list.
>>
>> Vladimir,
>>
>> To be honest, I don't see much difference between null values for objects
>> and zero values for primitives. From BinaryObject semantics standpoint,
>> both are default values for corresponding types. These values will be
>> returned from the BinaryObject.field() method regardless of whether we
>> actually save then in the byte array or not. Having said that, why don't we
>> just skip them during write?
>>
>> You optimization will be still useful though, because there are often a
>> lot of ints and longs that are not zeros, but still small and can fit 1-2
>> bytes. We already added such compaction in direct message marshaling and it
>> reduced overall traffic by around 30%.
>>
>> -Val
>>
>>
>> On Thu, Oct 27, 2016 at 2:21 PM, Vladimir Ozerov <vo...@gridgain.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I am not very concerned with null fields overhead, because usually it
>>> won't be significant. However, there is a problem with zeros. User object
>>> might have lots of int/long zeros, this is not uncommon. And each zero will
>>> consume 4-8 additional bytes. We probably will implement special
>>> optimization which will write such fields in special compact format.
>>>
>>> Vladimir.
>>>
>>> On Thu, Oct 27, 2016 at 10:55 PM, vkulichenko <
>>> valentin.kulichenko@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> Yes, null values consume memory. I believe this can be optimized, but I
>>>> haven't seen issues with this so far. Unless you have hundreds of fields
>>>> most of which are nulls (very rare case), the overhead is minimal.
>>>>
>>>> -Val
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context: http://apache-ignite-users.705
>>>> 18.x6.nabble.com/BinaryObject-pros-cons-tp8541p8563.html
>>>> Sent from the Apache Ignite Users mailing list archive at Nabble.com.
>>>>
>>>
>>>
>>
>

Re: BinaryObject pros/cons

Posted by Valentin Kulichenko <va...@gmail.com>.
Vova,

Why do we need to write zeros and nulls in the first place? What's the
value of having them in the byte array?

-Val

On Fri, Oct 28, 2016 at 1:18 AM, Vladimir Ozerov <vo...@gridgain.com>
wrote:

> Valya,
>
> Currently null value is written as one byte, while zero value of long type
> is written as 9 bytes. I want to improve that and write zeros as one byte
> as well.
>
> As per var-length encoding, I am strongly against it. It saves IO and
> memory at the cost of CPU. If we encode numbers in this way we will
> slowdown SQL (which is already not very fast, to be honest). Because
> instead of a single read memory read, we will have to perform multiple
> reads and then apply some mechanics to restore original value. We already
> have such problem with Strings - Java stores them as UTF-16, but we encode
> them as UTF-8. As a result every read of a string field in SQL results in
> decoding overhead.
>
> Vladimir.
>
> On Fri, Oct 28, 2016 at 6:07 AM, Valentin Kulichenko <
> valentin.kulichenko@gmail.com> wrote:
>
>> Cross-posting this to dev list.
>>
>> Vladimir,
>>
>> To be honest, I don't see much difference between null values for objects
>> and zero values for primitives. From BinaryObject semantics standpoint,
>> both are default values for corresponding types. These values will be
>> returned from the BinaryObject.field() method regardless of whether we
>> actually save then in the byte array or not. Having said that, why don't we
>> just skip them during write?
>>
>> You optimization will be still useful though, because there are often a
>> lot of ints and longs that are not zeros, but still small and can fit 1-2
>> bytes. We already added such compaction in direct message marshaling and it
>> reduced overall traffic by around 30%.
>>
>> -Val
>>
>>
>> On Thu, Oct 27, 2016 at 2:21 PM, Vladimir Ozerov <vo...@gridgain.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I am not very concerned with null fields overhead, because usually it
>>> won't be significant. However, there is a problem with zeros. User object
>>> might have lots of int/long zeros, this is not uncommon. And each zero will
>>> consume 4-8 additional bytes. We probably will implement special
>>> optimization which will write such fields in special compact format.
>>>
>>> Vladimir.
>>>
>>> On Thu, Oct 27, 2016 at 10:55 PM, vkulichenko <
>>> valentin.kulichenko@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> Yes, null values consume memory. I believe this can be optimized, but I
>>>> haven't seen issues with this so far. Unless you have hundreds of fields
>>>> most of which are nulls (very rare case), the overhead is minimal.
>>>>
>>>> -Val
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context: http://apache-ignite-users.705
>>>> 18.x6.nabble.com/BinaryObject-pros-cons-tp8541p8563.html
>>>> Sent from the Apache Ignite Users mailing list archive at Nabble.com.
>>>>
>>>
>>>
>>
>

Re: BinaryObject pros/cons

Posted by Vladimir Ozerov <vo...@gridgain.com>.
Valya,

Currently null value is written as one byte, while zero value of long type
is written as 9 bytes. I want to improve that and write zeros as one byte
as well.

As per var-length encoding, I am strongly against it. It saves IO and
memory at the cost of CPU. If we encode numbers in this way we will
slowdown SQL (which is already not very fast, to be honest). Because
instead of a single read memory read, we will have to perform multiple
reads and then apply some mechanics to restore original value. We already
have such problem with Strings - Java stores them as UTF-16, but we encode
them as UTF-8. As a result every read of a string field in SQL results in
decoding overhead.

Vladimir.

On Fri, Oct 28, 2016 at 6:07 AM, Valentin Kulichenko <
valentin.kulichenko@gmail.com> wrote:

> Cross-posting this to dev list.
>
> Vladimir,
>
> To be honest, I don't see much difference between null values for objects
> and zero values for primitives. From BinaryObject semantics standpoint,
> both are default values for corresponding types. These values will be
> returned from the BinaryObject.field() method regardless of whether we
> actually save then in the byte array or not. Having said that, why don't we
> just skip them during write?
>
> You optimization will be still useful though, because there are often a
> lot of ints and longs that are not zeros, but still small and can fit 1-2
> bytes. We already added such compaction in direct message marshaling and it
> reduced overall traffic by around 30%.
>
> -Val
>
>
> On Thu, Oct 27, 2016 at 2:21 PM, Vladimir Ozerov <vo...@gridgain.com>
> wrote:
>
>> Hi,
>>
>> I am not very concerned with null fields overhead, because usually it
>> won't be significant. However, there is a problem with zeros. User object
>> might have lots of int/long zeros, this is not uncommon. And each zero will
>> consume 4-8 additional bytes. We probably will implement special
>> optimization which will write such fields in special compact format.
>>
>> Vladimir.
>>
>> On Thu, Oct 27, 2016 at 10:55 PM, vkulichenko <
>> valentin.kulichenko@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> Yes, null values consume memory. I believe this can be optimized, but I
>>> haven't seen issues with this so far. Unless you have hundreds of fields
>>> most of which are nulls (very rare case), the overhead is minimal.
>>>
>>> -Val
>>>
>>>
>>>
>>> --
>>> View this message in context: http://apache-ignite-users.705
>>> 18.x6.nabble.com/BinaryObject-pros-cons-tp8541p8563.html
>>> Sent from the Apache Ignite Users mailing list archive at Nabble.com.
>>>
>>
>>
>

Re: BinaryObject pros/cons

Posted by Vladimir Ozerov <vo...@gridgain.com>.
Valya,

Currently null value is written as one byte, while zero value of long type
is written as 9 bytes. I want to improve that and write zeros as one byte
as well.

As per var-length encoding, I am strongly against it. It saves IO and
memory at the cost of CPU. If we encode numbers in this way we will
slowdown SQL (which is already not very fast, to be honest). Because
instead of a single read memory read, we will have to perform multiple
reads and then apply some mechanics to restore original value. We already
have such problem with Strings - Java stores them as UTF-16, but we encode
them as UTF-8. As a result every read of a string field in SQL results in
decoding overhead.

Vladimir.

On Fri, Oct 28, 2016 at 6:07 AM, Valentin Kulichenko <
valentin.kulichenko@gmail.com> wrote:

> Cross-posting this to dev list.
>
> Vladimir,
>
> To be honest, I don't see much difference between null values for objects
> and zero values for primitives. From BinaryObject semantics standpoint,
> both are default values for corresponding types. These values will be
> returned from the BinaryObject.field() method regardless of whether we
> actually save then in the byte array or not. Having said that, why don't we
> just skip them during write?
>
> You optimization will be still useful though, because there are often a
> lot of ints and longs that are not zeros, but still small and can fit 1-2
> bytes. We already added such compaction in direct message marshaling and it
> reduced overall traffic by around 30%.
>
> -Val
>
>
> On Thu, Oct 27, 2016 at 2:21 PM, Vladimir Ozerov <vo...@gridgain.com>
> wrote:
>
>> Hi,
>>
>> I am not very concerned with null fields overhead, because usually it
>> won't be significant. However, there is a problem with zeros. User object
>> might have lots of int/long zeros, this is not uncommon. And each zero will
>> consume 4-8 additional bytes. We probably will implement special
>> optimization which will write such fields in special compact format.
>>
>> Vladimir.
>>
>> On Thu, Oct 27, 2016 at 10:55 PM, vkulichenko <
>> valentin.kulichenko@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> Yes, null values consume memory. I believe this can be optimized, but I
>>> haven't seen issues with this so far. Unless you have hundreds of fields
>>> most of which are nulls (very rare case), the overhead is minimal.
>>>
>>> -Val
>>>
>>>
>>>
>>> --
>>> View this message in context: http://apache-ignite-users.705
>>> 18.x6.nabble.com/BinaryObject-pros-cons-tp8541p8563.html
>>> Sent from the Apache Ignite Users mailing list archive at Nabble.com.
>>>
>>
>>
>