You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@ignite.apache.org by Alexander Paschenko <al...@gmail.com> on 2016/09/26 19:40:15 UTC

Default hash code generation strategy for new binary objects

Hello Igniters,

As DML support is near, it's critical that we agree on how we generate
hash codes for new keys in presence of binary marshaller. Actually,
this discussion isn't new - please see its beginning here:

http://apache-ignite-developers.2346864.n4.nabble.com/All-BinaryObjects-created-by-BinaryObjectBuilder-stored-at-the-same-partition-by-default-td8042.html

Still, I'm creating this new thread to make getting to the final
solution as simple and fast as possible.

I remind everyone that the approach that has got the least critics was
the one proposed by Vladimir Ozerov:

<quote>
I think we can do the following:
1) Add "has hash code" flag as Denis suggested.
2) If object without a hash code is put to cache, throw an exception.
3) Add *BinaryEqualsHashCodeResolver *interface.
4) Add default implementation which will auto-generate hash code. *Print a
warning when auto-generation occurs*, so that user is aware that he is
likely to have problems with normal GETs/PUTs.
5) Add another implementation which will use encoded string to calculate a
hash code. E.g. *new BinaryEqualsHashCodeResolver("{a} * 31 + {b}")*.
Originally proposed by Yakov some time ago.
</quote>

After that, Sergi suggested that instead of a "formula" we keep just a
list of the "fields" that participate in hash code evaluation, and
with that list, we simply calculate hash code just like IDE does -
with all its bit shifts and additions.

I'm planning on settling down with this combined Vlad-Sergi approach.
Any objections?

And an extra question I have: Vlad, you suggest that we both throw an
exception on cache code absence and that we might generate it as the
last resort. Do I understand you correctly that you suggest generating
random code only in context of SQL, but throw exception for keys
without codes on ordinary put?

And yes, built-in hash codes for JDK types are supported as well as
items 1-2 from Vlad's list (there's already fixed issue of IGNITE-3633
for the flag and its presence check).

- Alex

Re: Default hash code generation strategy for new binary objects

Posted by Alexander Paschenko <al...@gmail.com>.

Current design is documented here:
https://issues.apache.org/jira/browse/IGNITE-4011, it incorps both
equals and hashCode.

- Alex

2016-10-03 15:24 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:
> On Mon, Oct 3, 2016 at 5:03 AM, Alexey Goncharuk <alexey.goncharuk@gmail.com
>> wrote:
>
>> Dima,
>>
>> Why do you think somebody will need to override equals? Currently we do not
>> have such an ability and AFAIK we did not have a single question regarding
>> this. Other products, such as Hazelcast, rely solely on binary
>> representation of a key. If this is never used, why do we need to increase
>> the configuration complexity?
>>
>
> Agree. As I mentioned, let's proceed with the current design and see if
> anyone asks to override "equals" method. We can always add it later.
>
>
>>
>> 2016-10-01 5:25 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:
>>
>> > Alex,
>> >
>> > I can't post in the ticket, because my Jira login stopped working, so I
>> > will post here.
>> >
>> > I only have 1 question - do we purposely not support custom equals
>> > implementation? Seems we could simply add 2 methods to the
>> > BinaryObjectHashCodeResolver: isUseEquals() and computeEquals(). Having
>> > said that, I am OK with current design, we can always add equals support
>> > later.
>> >
>> > Otherwise, looks good.
>> >
>> > D.
>> >
>> > On Thu, Sep 29, 2016 at 5:19 PM, Alexander Paschenko <
>> > alexander.a.paschenko@gmail.com> wrote:
>> >
>> > > I've posted proposed example of hash code resolver interface and XML
>> > > configuration for classless key on issue page
>> > > https://issues.apache.org/jira/browse/IGNITE-2294.
>> > >
>> > > 2016-09-29 20:16 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:
>> > > > On Thu, Sep 29, 2016 at 9:57 AM, Denis Magda <dm...@gridgain.com>
>> > > wrote:
>> > > >
>> > > >> Alex,
>> > > >>
>> > > >> A minor note regarding this
>> > > >>
>> > > >> > On Sep 29, 2016, at 8:39 AM, Alexey Goncharuk <
>> > > >> alexey.goncharuk@gmail.com> wrote:
>> > > >> >
>> > > >> > A set of fields participating in hashCode and equals is impossible
>> > to
>> > > >> > change without cluster restart.
>> > > >>
>> > > >> It’s unlikely that someone will change a key or at least it should
>> be
>> > a
>> > > >> rare or accidental operation. In any case if this happens a user
>> must
>> > > >> upgrade all the keys he already has in a cache. To resolve this it’s
>> > > >> simpler to create a new cache with updated configuration and
>> populate
>> > > data
>> > > >> there. This will not require us to restart a cluster.
>> > > >>
>> > > >
>> > > > We should have a check in code that would prohibit changing hashcode
>> > > fields
>> > > > or the hashcode resolver class within the same cache. Using a
>> different
>> > > > cache to store the keys with new hashcodes is always an option and
>> does
>> > > not
>> > > > require anything special from our side.
>> > > >
>> > > >
>> > > >>
>> > > >> —
>> > > >> Denis
>> > >
>> >
>>

Re: Default hash code generation strategy for new binary objects

Posted by Dmitriy Setrakyan <ds...@apache.org>.

On Mon, Oct 3, 2016 at 5:03 AM, Alexey Goncharuk <alexey.goncharuk@gmail.com
> wrote:

> Dima,
>
> Why do you think somebody will need to override equals? Currently we do not
> have such an ability and AFAIK we did not have a single question regarding
> this. Other products, such as Hazelcast, rely solely on binary
> representation of a key. If this is never used, why do we need to increase
> the configuration complexity?
>

Agree. As I mentioned, let's proceed with the current design and see if
anyone asks to override "equals" method. We can always add it later.


>
> 2016-10-01 5:25 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:
>
> > Alex,
> >
> > I can't post in the ticket, because my Jira login stopped working, so I
> > will post here.
> >
> > I only have 1 question - do we purposely not support custom equals
> > implementation? Seems we could simply add 2 methods to the
> > BinaryObjectHashCodeResolver: isUseEquals() and computeEquals(). Having
> > said that, I am OK with current design, we can always add equals support
> > later.
> >
> > Otherwise, looks good.
> >
> > D.
> >
> > On Thu, Sep 29, 2016 at 5:19 PM, Alexander Paschenko <
> > alexander.a.paschenko@gmail.com> wrote:
> >
> > > I've posted proposed example of hash code resolver interface and XML
> > > configuration for classless key on issue page
> > > https://issues.apache.org/jira/browse/IGNITE-2294.
> > >
> > > 2016-09-29 20:16 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:
> > > > On Thu, Sep 29, 2016 at 9:57 AM, Denis Magda <dm...@gridgain.com>
> > > wrote:
> > > >
> > > >> Alex,
> > > >>
> > > >> A minor note regarding this
> > > >>
> > > >> > On Sep 29, 2016, at 8:39 AM, Alexey Goncharuk <
> > > >> alexey.goncharuk@gmail.com> wrote:
> > > >> >
> > > >> > A set of fields participating in hashCode and equals is impossible
> > to
> > > >> > change without cluster restart.
> > > >>
> > > >> It’s unlikely that someone will change a key or at least it should
> be
> > a
> > > >> rare or accidental operation. In any case if this happens a user
> must
> > > >> upgrade all the keys he already has in a cache. To resolve this it’s
> > > >> simpler to create a new cache with updated configuration and
> populate
> > > data
> > > >> there. This will not require us to restart a cluster.
> > > >>
> > > >
> > > > We should have a check in code that would prohibit changing hashcode
> > > fields
> > > > or the hashcode resolver class within the same cache. Using a
> different
> > > > cache to store the keys with new hashcodes is always an option and
> does
> > > not
> > > > require anything special from our side.
> > > >
> > > >
> > > >>
> > > >> —
> > > >> Denis
> > >
> >
>

Re: Default hash code generation strategy for new binary objects

Posted by Alexey Goncharuk <al...@gmail.com>.

Dima,

Why do you think somebody will need to override equals? Currently we do not
have such an ability and AFAIK we did not have a single question regarding
this. Other products, such as Hazelcast, rely solely on binary
representation of a key. If this is never used, why do we need to increase
the configuration complexity?

2016-10-01 5:25 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:

> Alex,
>
> I can't post in the ticket, because my Jira login stopped working, so I
> will post here.
>
> I only have 1 question - do we purposely not support custom equals
> implementation? Seems we could simply add 2 methods to the
> BinaryObjectHashCodeResolver: isUseEquals() and computeEquals(). Having
> said that, I am OK with current design, we can always add equals support
> later.
>
> Otherwise, looks good.
>
> D.
>
> On Thu, Sep 29, 2016 at 5:19 PM, Alexander Paschenko <
> alexander.a.paschenko@gmail.com> wrote:
>
> > I've posted proposed example of hash code resolver interface and XML
> > configuration for classless key on issue page
> > https://issues.apache.org/jira/browse/IGNITE-2294.
> >
> > 2016-09-29 20:16 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:
> > > On Thu, Sep 29, 2016 at 9:57 AM, Denis Magda <dm...@gridgain.com>
> > wrote:
> > >
> > >> Alex,
> > >>
> > >> A minor note regarding this
> > >>
> > >> > On Sep 29, 2016, at 8:39 AM, Alexey Goncharuk <
> > >> alexey.goncharuk@gmail.com> wrote:
> > >> >
> > >> > A set of fields participating in hashCode and equals is impossible
> to
> > >> > change without cluster restart.
> > >>
> > >> It’s unlikely that someone will change a key or at least it should be
> a
> > >> rare or accidental operation. In any case if this happens a user must
> > >> upgrade all the keys he already has in a cache. To resolve this it’s
> > >> simpler to create a new cache with updated configuration and populate
> > data
> > >> there. This will not require us to restart a cluster.
> > >>
> > >
> > > We should have a check in code that would prohibit changing hashcode
> > fields
> > > or the hashcode resolver class within the same cache. Using a different
> > > cache to store the keys with new hashcodes is always an option and does
> > not
> > > require anything special from our side.
> > >
> > >
> > >>
> > >> —
> > >> Denis
> >
>

Re: Default hash code generation strategy for new binary objects

Posted by Dmitriy Setrakyan <ds...@apache.org>.

Alex,

I can't post in the ticket, because my Jira login stopped working, so I
will post here.

I only have 1 question - do we purposely not support custom equals
implementation? Seems we could simply add 2 methods to the
BinaryObjectHashCodeResolver: isUseEquals() and computeEquals(). Having
said that, I am OK with current design, we can always add equals support
later.

Otherwise, looks good.

D.

On Thu, Sep 29, 2016 at 5:19 PM, Alexander Paschenko <
alexander.a.paschenko@gmail.com> wrote:

> I've posted proposed example of hash code resolver interface and XML
> configuration for classless key on issue page
> https://issues.apache.org/jira/browse/IGNITE-2294.
>
> 2016-09-29 20:16 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:
> > On Thu, Sep 29, 2016 at 9:57 AM, Denis Magda <dm...@gridgain.com>
> wrote:
> >
> >> Alex,
> >>
> >> A minor note regarding this
> >>
> >> > On Sep 29, 2016, at 8:39 AM, Alexey Goncharuk <
> >> alexey.goncharuk@gmail.com> wrote:
> >> >
> >> > A set of fields participating in hashCode and equals is impossible to
> >> > change without cluster restart.
> >>
> >> It’s unlikely that someone will change a key or at least it should be a
> >> rare or accidental operation. In any case if this happens a user must
> >> upgrade all the keys he already has in a cache. To resolve this it’s
> >> simpler to create a new cache with updated configuration and populate
> data
> >> there. This will not require us to restart a cluster.
> >>
> >
> > We should have a check in code that would prohibit changing hashcode
> fields
> > or the hashcode resolver class within the same cache. Using a different
> > cache to store the keys with new hashcodes is always an option and does
> not
> > require anything special from our side.
> >
> >
> >>
> >> —
> >> Denis
>

Re: Default hash code generation strategy for new binary objects

Posted by Alexander Paschenko <al...@gmail.com>.

I've posted proposed example of hash code resolver interface and XML
configuration for classless key on issue page
https://issues.apache.org/jira/browse/IGNITE-2294.

2016-09-29 20:16 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:
> On Thu, Sep 29, 2016 at 9:57 AM, Denis Magda <dm...@gridgain.com> wrote:
>
>> Alex,
>>
>> A minor note regarding this
>>
>> > On Sep 29, 2016, at 8:39 AM, Alexey Goncharuk <
>> alexey.goncharuk@gmail.com> wrote:
>> >
>> > A set of fields participating in hashCode and equals is impossible to
>> > change without cluster restart.
>>
>> It’s unlikely that someone will change a key or at least it should be a
>> rare or accidental operation. In any case if this happens a user must
>> upgrade all the keys he already has in a cache. To resolve this it’s
>> simpler to create a new cache with updated configuration and populate data
>> there. This will not require us to restart a cluster.
>>
>
> We should have a check in code that would prohibit changing hashcode fields
> or the hashcode resolver class within the same cache. Using a different
> cache to store the keys with new hashcodes is always an option and does not
> require anything special from our side.
>
>
>>
>> —
>> Denis

Re: Default hash code generation strategy for new binary objects

Posted by Dmitriy Setrakyan <ds...@apache.org>.

On Thu, Sep 29, 2016 at 9:57 AM, Denis Magda <dm...@gridgain.com> wrote:

> Alex,
>
> A minor note regarding this
>
> > On Sep 29, 2016, at 8:39 AM, Alexey Goncharuk <
> alexey.goncharuk@gmail.com> wrote:
> >
> > A set of fields participating in hashCode and equals is impossible to
> > change without cluster restart.
>
> It’s unlikely that someone will change a key or at least it should be a
> rare or accidental operation. In any case if this happens a user must
> upgrade all the keys he already has in a cache. To resolve this it’s
> simpler to create a new cache with updated configuration and populate data
> there. This will not require us to restart a cluster.
>

We should have a check in code that would prohibit changing hashcode fields
or the hashcode resolver class within the same cache. Using a different
cache to store the keys with new hashcodes is always an option and does not
require anything special from our side.

>
> —
> Denis

Re: Default hash code generation strategy for new binary objects

Posted by Denis Magda <dm...@gridgain.com>.

Alex, 

A minor note regarding this

> On Sep 29, 2016, at 8:39 AM, Alexey Goncharuk <al...@gmail.com> wrote:
> 
> A set of fields participating in hashCode and equals is impossible to
> change without cluster restart. 

It’s unlikely that someone will change a key or at least it should be a rare or accidental operation. In any case if this happens a user must upgrade all the keys he already has in a cache. To resolve this it’s simpler to create a new cache with updated configuration and populate data there. This will not require us to restart a cluster.

—
Denis

Re: Default hash code generation strategy for new binary objects

Posted by Alexey Goncharuk <al...@gmail.com>.

In the assumption that we do not have 'non-equals' fields, binary array
comparison generally should be faster, because even  if we did a
field-by-field comparison, we would read the same amount of data anyway,
but also would need to do some byte jiggling for type conversion.

2016-09-29 19:07 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:

> Alexey, in your opinion, what will be faster, the binary array comparison
> or field-by-field comparison?
>
> On Thu, Sep 29, 2016 at 8:39 AM, Alexey Goncharuk <
> alexey.goncharuk@gmail.com> wrote:
>
> > Folks, let me point out a few obvious (or not) things
> >
> > A set of fields participating in hashCode and equals is impossible to
> > change without cluster restart. Imagine a new client adding a field F to
> > key structure (A, B) so that new key is (A, B, F). In this case key (1,
> 2,
> > 0) will be treated as a different key w/respect to (1, 2, 3), while the
> new
> > client expects them to be the same. I think the set of fields defining
> the
> > key _must_ be used in both hash code and equals calculation and cannot
> > change over time. Having said that, we can use binary array comparison as
> > object equality test. Presence of new 'garbage' fields in key looks
> useless
> > to me.
> >
> > So, we should support:
> > 1) All the things Dmitriy pointed out, in other words - multi-field key
> > with affinity key fields
> > 2) Ability to instantiate the key using a text-only environment
> > 3) Ability to define the structure of a key using XML or DDL
> >
> > I think all of these can be done via combination of HashCodeResolver
> > interface + default resolver which can read type configuration.
> >
> > 2016-09-29 3:54 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:
> >
> > > Guys,
> > >
> > > We need to look at 3 cases:
> > >
> > > a) key is just one field
> > > b) key is multiple fields
> > > c) key is one or multiple fields, with possibility of an alternate
> > affinity
> > > key
> > >
> > > For (a) and (b), whenever a type is defined in XML, and further in
> DML, a
> > > user will specify which fields are part of the key. In that case, we
> > should
> > > just grab those fields and calculate the hashcode automatically.
> > >
> > > For (c), a user should specify in XML, and further in DML, which fields
> > > should be used for affinity, in addition to the key fields. In that
> case,
> > > we again should grab those fields and calculate the hashcodes for the
> > > primary key and the affinity key.
> > >
> > > I really am not sure if there are other ways of doing it. Am I missing
> > > something?
> > >
> > > D.
> > >
> > >
> > > On Wed, Sep 28, 2016 at 5:14 PM, Denis Magda <dm...@gridgain.com>
> > wrote:
> > >
> > > > Let me show the picture I have in my mind:
> > > >
> > > > Primary key is a must for all INSERT and MERGE operations. If it’s
> not
> > > set
> > > > then an INSERT/MERGE fails.
> > > > If a primary key is a boxed/unboxed primitive (int, Integer, String,
> > > Date,
> > > > etc.) then the key value is used for hashCode calculation. At the
> same
> > > time
> > > > the key will be an affinity key.
> > > > If a primary key is a custom object then it’s value can be passed as
> a
> > > > param directly from Java, .Net, C++ in a way like “INSERT (_key,
> > field1,
> > > > field2) VALUES (?, val1, val2)”. In this scenario we will call
> hashCode
> > > > directly on the key's value. In addition, we will be able to get an
> > > > affinity key since we know key’s type class descriptor
> > > > (BinaryClassDescriptor).
> > > > If a primary key is still a custom key and we want to insert its
> value
> > > > from an SQL console, PHP, Tableu, etc. then we can’t pass the key’s
> > value
> > > > as is. Here we’re trying to apply a workaround by listing key's
> fields
> > in
> > > > INSERT/MERGE and the task is to properly re-construct the key on our
> > side
> > > > using only specific fields.
> > > >
> > > > Is my understanding correct for all the bullets above?
> > > >
> > > > If so then, yes, I would agree that we need to list these fields in a
> > > > configuration and the default hash code resolver will use them as
> well.
> > > > Moreover, we have to pin point an affinity field. So, the question is
> > > what
> > > > the configuration we should use.
> > > >
> > > > Community, any other thoughts/ideas?
> > > >
> > > > —
> > > > Denis
> > > >
> > > > > On Sep 28, 2016, at 4:16 PM, Alexander Paschenko <
> > > > alexander.a.paschenko@gmail.com> wrote:
> > > > >
> > > > > Also MERGE.
> > > > >
> > > > > 2016-09-29 2:10 GMT+03:00 Denis Magda <dm...@gridgain.com>:
> > > > >> You need a hash code only for INSERT operation, right?
> > > > >>
> > > > >> —
> > > > >> Denis
> > > > >>
> > > > >>> On Sep 28, 2016, at 3:47 PM, Alexander Paschenko <
> > > > alexander.a.paschenko@gmail.com> wrote:
> > > > >>>
> > > > >>> But what if the user works from some kind of console and just
> types
> > > > >>> the queries as text in full and does not bind params via JDBC or
> > > > >>> something alike? What if there's no binary object? I don't see
> why
> > we
> > > > >>> should keep the user from usual cache gets in this case. I really
> > > like
> > > > >>> the idea of supplying the values of distinct fields, thus freeing
> > the
> > > > >>> user of the need to mess with objects and builders, AND then just
> > > > >>> calculating hash code as suggested before - say, via explicitly
> > > > >>> listing participating fields in XML or by marking them with
> > transient
> > > > >>> keyword or some annotation.
> > > > >>> Actually, I believe that's the only case when we need to generate
> > any
> > > > >>> hash codes - when the class is present, we can just get hash code
> > > from
> > > > >>> its implementation of its method. When there's no class, we
> > generate.
> > > > >>> And all that is solely for SQL. For the rest - just throw an
> > > exception
> > > > >>> when there's no hash code manually set for binary object. I don't
> > see
> > > > >>> why we should try to generate anything when the user already is
> > using
> > > > >>> Ignite in full, not just via limited interface of SQL.
> > > > >>>
> > > > >>> 2016-09-29 0:31 GMT+03:00 Denis Magda <dm...@gridgain.com>:
> > > > >>>> Hmm, this is a good question.
> > > > >>>>
> > > > >>>> If a user doesn’t provide a _key when an INSERT is executed for
> me
> > > it
> > > > means that he is not going to use the key later for cache.get/put,
> > > DELETE,
> > > > UPDATE and other possible operation simply because he doesn’t know
> how
> > to
> > > > reconstruct the key back in his code. If he wants to use the primary
> > key
> > > in
> > > > the rest of operations then he must provide it at INSERT time.
> > > > >>>>
> > > > >>>> Do we need this key only for a case when an object is being
> > inserted
> > > > into a cache? If it’s so I would auto-generate a key using ‘long’ as
> a
> > > key
> > > > type. I do remember that we provided the auto-generation for Spark
> > module
> > > > in a some way that may be useful here.
> > > > >>>>
> > > > >>>> —
> > > > >>>> Denis
> > > > >>>>
> > > > >>>>> On Sep 28, 2016, at 9:53 AM, Alexander Paschenko <
> > > > alexander.a.paschenko@gmail.com> wrote:
> > > > >>>>>
> > > > >>>>> Denis,
> > > > >>>>>
> > > > >>>>> That's not what I was asking about.
> > > > >>>>> Currently DML implementation allows for dymanic instantiation
> of
> > > > keys,
> > > > >>>>> in other words, user does not have to provide value for
> > > object-typed
> > > > >>>>> _key column - instead, he may supply just field values based on
> > > which
> > > > >>>>> _key will be dynamically instantiated/binary built. And that's
> > the
> > > > >>>>> whole point of this discussion as I see it: what to do when
> we've
> > > > >>>>> binary built classless key that we build ourselves inside SQL
> > > engine
> > > > >>>>> and don't know how to compute hash code for it?
> > > > >>>>>
> > > > >>>>> - Alex
> > > > >>>>>
> > > > >>>>> 2016-09-28 19:48 GMT+03:00 Denis Magda <dm...@gridgain.com>:
> > > > >>>>>> Alexander,
> > > > >>>>>>
> > > > >>>>>> As I guess if we have a key without a class then it will be
> > > > constructed using a BinaryBuilder instance and it’s user
> responsibility
> > > to
> > > > set the has code at the end with BinaryBuilder.hasCode method. Sure,
> > all
> > > > this cases must be well-documented in both Java Doc API and Apache
> > Ignite
> > > > documentation.
> > > > >>>>>>
> > > > >>>>>> —
> > > > >>>>>> Denis
> > > > >>>>>>
> > > > >>>>>>> On Sep 28, 2016, at 9:33 AM, Alexander Paschenko <
> > > > alexander.a.paschenko@gmail.com> wrote:
> > > > >>>>>>>
> > > > >>>>>>> Dmitry, Denis,
> > > > >>>>>>>
> > > > >>>>>>> OK, but I think it's necessary to address also the cases when
> > > > there's
> > > > >>>>>>> no actual class for the key, and its fields are simply
> declared
> > > in
> > > > >>>>>>> XML. In this case, there are no fields to be marked
> transient.
> > > > What do
> > > > >>>>>>> we do then? List transient fields in XML separately?
> > > > >>>>>>>
> > > > >>>>>>> - Alex
> > > > >>>>>>>
> > > > >>>>>>> 2016-09-28 4:15 GMT+03:00 Dmitriy Setrakyan <
> > > dsetrakyan@apache.org
> > > > >:
> > > > >>>>>>>> Agree with Denis.
> > > > >>>>>>>>
> > > > >>>>>>>> - by default, all non-transient key fields should
> participate
> > in
> > > > the
> > > > >>>>>>>> hashcode generation
> > > > >>>>>>>> - when working on DDL, then the primary key fields should
> > > > participate in
> > > > >>>>>>>> the hashcode
> > > > >>>>>>>> - we should add a resolver to override the default behavior
> > > > (please
> > > > >>>>>>>> propose the interface in Jira)
> > > > >>>>>>>> - we should print out a warning, only once per type, the the
> > > > hashcode
> > > > >>>>>>>> has been automatically generated based on which fields and
> > which
> > > > formula
> > > > >>>>>>>>
> > > > >>>>>>>> D.
> > > > >>>>>>>>
> > > > >>>>>>>> On Tue, Sep 27, 2016 at 5:42 PM, Denis Magda <
> > > dmagda@gridgain.com>
> > > > wrote:
> > > > >>>>>>>>
> > > > >>>>>>>>> Hi Alexander,
> > > > >>>>>>>>>
> > > > >>>>>>>>> Vladimir’s proposal sounds reasonable to me. However we
> must
> > > > keep in mind
> > > > >>>>>>>>> one important thing. Binary objects were designed to
> address
> > > the
> > > > following
> > > > >>>>>>>>> disadvantages a regular serializer, like optimized
> > marshaller,
> > > > has:
> > > > >>>>>>>>> necessity to deserialize an object on a server side every
> > time
> > > > it’s needed.
> > > > >>>>>>>>> necessity to hold an object in both serialized and
> > deserialized
> > > > forms on
> > > > >>>>>>>>> the server node.
> > > > >>>>>>>>> necessity to restart the whole cluster each time an object
> > > > version is
> > > > >>>>>>>>> changed (new field is added or an old one is removed).
> > > > >>>>>>>>> If it will be needed to perform step 3 for a default
> > > > implementation of the
> > > > >>>>>>>>> binary resolver just because the resolver has to consider
> new
> > > > fields or
> > > > >>>>>>>>> ignore old ones then such an implementation sucks. Overall,
> > the
> > > > default
> > > > >>>>>>>>> implementation should use the reflection coming over all
> the
> > > > fields a key
> > > > >>>>>>>>> has ignoring the ones that are marked with “transient”
> > keyword.
> > > > If a user
> > > > >>>>>>>>> wants to control the default resolver's logic then he can
> > label
> > > > all the
> > > > >>>>>>>>> fields that mustn’t be of a final has code value with
> > > > “transient”. This has
> > > > >>>>>>>>> to be well-documented for sure.
> > > > >>>>>>>>>
> > > > >>>>>>>>> Makes sense?
> > > > >>>>>>>>>
> > > > >>>>>>>>> —
> > > > >>>>>>>>> Denis
> > > > >>>>>>>>>
> > > > >>>>>>>>>> On Sep 26, 2016, at 12:40 PM, Alexander Paschenko <
> > > > >>>>>>>>> alexander.a.paschenko@gmail.com> wrote:
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> Hello Igniters,
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> As DML support is near, it's critical that we agree on how
> > we
> > > > generate
> > > > >>>>>>>>>> hash codes for new keys in presence of binary marshaller.
> > > > Actually,
> > > > >>>>>>>>>> this discussion isn't new - please see its beginning here:
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> http://apache-ignite-developers.2346864.n4.nabble.
> > > > >>>>>>>>> com/All-BinaryObjects-created-
> by-BinaryObjectBuilder-stored-
> > > > >>>>>>>>> at-the-same-partition-by-default-td8042.html
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> Still, I'm creating this new thread to make getting to the
> > > final
> > > > >>>>>>>>>> solution as simple and fast as possible.
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> I remind everyone that the approach that has got the least
> > > > critics was
> > > > >>>>>>>>>> the one proposed by Vladimir Ozerov:
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> <quote>
> > > > >>>>>>>>>> I think we can do the following:
> > > > >>>>>>>>>> 1) Add "has hash code" flag as Denis suggested.
> > > > >>>>>>>>>> 2) If object without a hash code is put to cache, throw an
> > > > exception.
> > > > >>>>>>>>>> 3) Add *BinaryEqualsHashCodeResolver *interface.
> > > > >>>>>>>>>> 4) Add default implementation which will auto-generate
> hash
> > > > code. *Print
> > > > >>>>>>>>> a
> > > > >>>>>>>>>> warning when auto-generation occurs*, so that user is
> aware
> > > > that he is
> > > > >>>>>>>>>> likely to have problems with normal GETs/PUTs.
> > > > >>>>>>>>>> 5) Add another implementation which will use encoded
> string
> > to
> > > > calculate
> > > > >>>>>>>>> a
> > > > >>>>>>>>>> hash code. E.g. *new BinaryEqualsHashCodeResolver("{a} *
> > 31 +
> > > > {b}")*.
> > > > >>>>>>>>>> Originally proposed by Yakov some time ago.
> > > > >>>>>>>>>> </quote>
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> After that, Sergi suggested that instead of a "formula" we
> > > keep
> > > > just a
> > > > >>>>>>>>>> list of the "fields" that participate in hash code
> > evaluation,
> > > > and
> > > > >>>>>>>>>> with that list, we simply calculate hash code just like
> IDE
> > > > does -
> > > > >>>>>>>>>> with all its bit shifts and additions.
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> I'm planning on settling down with this combined
> Vlad-Sergi
> > > > approach.
> > > > >>>>>>>>>> Any objections?
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> And an extra question I have: Vlad, you suggest that we
> both
> > > > throw an
> > > > >>>>>>>>>> exception on cache code absence and that we might generate
> > it
> > > > as the
> > > > >>>>>>>>>> last resort. Do I understand you correctly that you
> suggest
> > > > generating
> > > > >>>>>>>>>> random code only in context of SQL, but throw exception
> for
> > > keys
> > > > >>>>>>>>>> without codes on ordinary put?
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> And yes, built-in hash codes for JDK types are supported
> as
> > > > well as
> > > > >>>>>>>>>> items 1-2 from Vlad's list (there's already fixed issue of
> > > > IGNITE-3633
> > > > >>>>>>>>>> for the flag and its presence check).
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> - Alex
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>
> > > > >>>>
> > > > >>
> > > >
> > > >
> > >
> >
>

Re: Default hash code generation strategy for new binary objects

Posted by Dmitriy Setrakyan <ds...@apache.org>.

Alexey, in your opinion, what will be faster, the binary array comparison
or field-by-field comparison?

On Thu, Sep 29, 2016 at 8:39 AM, Alexey Goncharuk <
alexey.goncharuk@gmail.com> wrote:

> Folks, let me point out a few obvious (or not) things
>
> A set of fields participating in hashCode and equals is impossible to
> change without cluster restart. Imagine a new client adding a field F to
> key structure (A, B) so that new key is (A, B, F). In this case key (1, 2,
> 0) will be treated as a different key w/respect to (1, 2, 3), while the new
> client expects them to be the same. I think the set of fields defining the
> key _must_ be used in both hash code and equals calculation and cannot
> change over time. Having said that, we can use binary array comparison as
> object equality test. Presence of new 'garbage' fields in key looks useless
> to me.
>
> So, we should support:
> 1) All the things Dmitriy pointed out, in other words - multi-field key
> with affinity key fields
> 2) Ability to instantiate the key using a text-only environment
> 3) Ability to define the structure of a key using XML or DDL
>
> I think all of these can be done via combination of HashCodeResolver
> interface + default resolver which can read type configuration.
>
> 2016-09-29 3:54 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:
>
> > Guys,
> >
> > We need to look at 3 cases:
> >
> > a) key is just one field
> > b) key is multiple fields
> > c) key is one or multiple fields, with possibility of an alternate
> affinity
> > key
> >
> > For (a) and (b), whenever a type is defined in XML, and further in DML, a
> > user will specify which fields are part of the key. In that case, we
> should
> > just grab those fields and calculate the hashcode automatically.
> >
> > For (c), a user should specify in XML, and further in DML, which fields
> > should be used for affinity, in addition to the key fields. In that case,
> > we again should grab those fields and calculate the hashcodes for the
> > primary key and the affinity key.
> >
> > I really am not sure if there are other ways of doing it. Am I missing
> > something?
> >
> > D.
> >
> >
> > On Wed, Sep 28, 2016 at 5:14 PM, Denis Magda <dm...@gridgain.com>
> wrote:
> >
> > > Let me show the picture I have in my mind:
> > >
> > > Primary key is a must for all INSERT and MERGE operations. If it’s not
> > set
> > > then an INSERT/MERGE fails.
> > > If a primary key is a boxed/unboxed primitive (int, Integer, String,
> > Date,
> > > etc.) then the key value is used for hashCode calculation. At the same
> > time
> > > the key will be an affinity key.
> > > If a primary key is a custom object then it’s value can be passed as a
> > > param directly from Java, .Net, C++ in a way like “INSERT (_key,
> field1,
> > > field2) VALUES (?, val1, val2)”. In this scenario we will call hashCode
> > > directly on the key's value. In addition, we will be able to get an
> > > affinity key since we know key’s type class descriptor
> > > (BinaryClassDescriptor).
> > > If a primary key is still a custom key and we want to insert its value
> > > from an SQL console, PHP, Tableu, etc. then we can’t pass the key’s
> value
> > > as is. Here we’re trying to apply a workaround by listing key's fields
> in
> > > INSERT/MERGE and the task is to properly re-construct the key on our
> side
> > > using only specific fields.
> > >
> > > Is my understanding correct for all the bullets above?
> > >
> > > If so then, yes, I would agree that we need to list these fields in a
> > > configuration and the default hash code resolver will use them as well.
> > > Moreover, we have to pin point an affinity field. So, the question is
> > what
> > > the configuration we should use.
> > >
> > > Community, any other thoughts/ideas?
> > >
> > > —
> > > Denis
> > >
> > > > On Sep 28, 2016, at 4:16 PM, Alexander Paschenko <
> > > alexander.a.paschenko@gmail.com> wrote:
> > > >
> > > > Also MERGE.
> > > >
> > > > 2016-09-29 2:10 GMT+03:00 Denis Magda <dm...@gridgain.com>:
> > > >> You need a hash code only for INSERT operation, right?
> > > >>
> > > >> —
> > > >> Denis
> > > >>
> > > >>> On Sep 28, 2016, at 3:47 PM, Alexander Paschenko <
> > > alexander.a.paschenko@gmail.com> wrote:
> > > >>>
> > > >>> But what if the user works from some kind of console and just types
> > > >>> the queries as text in full and does not bind params via JDBC or
> > > >>> something alike? What if there's no binary object? I don't see why
> we
> > > >>> should keep the user from usual cache gets in this case. I really
> > like
> > > >>> the idea of supplying the values of distinct fields, thus freeing
> the
> > > >>> user of the need to mess with objects and builders, AND then just
> > > >>> calculating hash code as suggested before - say, via explicitly
> > > >>> listing participating fields in XML or by marking them with
> transient
> > > >>> keyword or some annotation.
> > > >>> Actually, I believe that's the only case when we need to generate
> any
> > > >>> hash codes - when the class is present, we can just get hash code
> > from
> > > >>> its implementation of its method. When there's no class, we
> generate.
> > > >>> And all that is solely for SQL. For the rest - just throw an
> > exception
> > > >>> when there's no hash code manually set for binary object. I don't
> see
> > > >>> why we should try to generate anything when the user already is
> using
> > > >>> Ignite in full, not just via limited interface of SQL.
> > > >>>
> > > >>> 2016-09-29 0:31 GMT+03:00 Denis Magda <dm...@gridgain.com>:
> > > >>>> Hmm, this is a good question.
> > > >>>>
> > > >>>> If a user doesn’t provide a _key when an INSERT is executed for me
> > it
> > > means that he is not going to use the key later for cache.get/put,
> > DELETE,
> > > UPDATE and other possible operation simply because he doesn’t know how
> to
> > > reconstruct the key back in his code. If he wants to use the primary
> key
> > in
> > > the rest of operations then he must provide it at INSERT time.
> > > >>>>
> > > >>>> Do we need this key only for a case when an object is being
> inserted
> > > into a cache? If it’s so I would auto-generate a key using ‘long’ as a
> > key
> > > type. I do remember that we provided the auto-generation for Spark
> module
> > > in a some way that may be useful here.
> > > >>>>
> > > >>>> —
> > > >>>> Denis
> > > >>>>
> > > >>>>> On Sep 28, 2016, at 9:53 AM, Alexander Paschenko <
> > > alexander.a.paschenko@gmail.com> wrote:
> > > >>>>>
> > > >>>>> Denis,
> > > >>>>>
> > > >>>>> That's not what I was asking about.
> > > >>>>> Currently DML implementation allows for dymanic instantiation of
> > > keys,
> > > >>>>> in other words, user does not have to provide value for
> > object-typed
> > > >>>>> _key column - instead, he may supply just field values based on
> > which
> > > >>>>> _key will be dynamically instantiated/binary built. And that's
> the
> > > >>>>> whole point of this discussion as I see it: what to do when we've
> > > >>>>> binary built classless key that we build ourselves inside SQL
> > engine
> > > >>>>> and don't know how to compute hash code for it?
> > > >>>>>
> > > >>>>> - Alex
> > > >>>>>
> > > >>>>> 2016-09-28 19:48 GMT+03:00 Denis Magda <dm...@gridgain.com>:
> > > >>>>>> Alexander,
> > > >>>>>>
> > > >>>>>> As I guess if we have a key without a class then it will be
> > > constructed using a BinaryBuilder instance and it’s user responsibility
> > to
> > > set the has code at the end with BinaryBuilder.hasCode method. Sure,
> all
> > > this cases must be well-documented in both Java Doc API and Apache
> Ignite
> > > documentation.
> > > >>>>>>
> > > >>>>>> —
> > > >>>>>> Denis
> > > >>>>>>
> > > >>>>>>> On Sep 28, 2016, at 9:33 AM, Alexander Paschenko <
> > > alexander.a.paschenko@gmail.com> wrote:
> > > >>>>>>>
> > > >>>>>>> Dmitry, Denis,
> > > >>>>>>>
> > > >>>>>>> OK, but I think it's necessary to address also the cases when
> > > there's
> > > >>>>>>> no actual class for the key, and its fields are simply declared
> > in
> > > >>>>>>> XML. In this case, there are no fields to be marked transient.
> > > What do
> > > >>>>>>> we do then? List transient fields in XML separately?
> > > >>>>>>>
> > > >>>>>>> - Alex
> > > >>>>>>>
> > > >>>>>>> 2016-09-28 4:15 GMT+03:00 Dmitriy Setrakyan <
> > dsetrakyan@apache.org
> > > >:
> > > >>>>>>>> Agree with Denis.
> > > >>>>>>>>
> > > >>>>>>>> - by default, all non-transient key fields should participate
> in
> > > the
> > > >>>>>>>> hashcode generation
> > > >>>>>>>> - when working on DDL, then the primary key fields should
> > > participate in
> > > >>>>>>>> the hashcode
> > > >>>>>>>> - we should add a resolver to override the default behavior
> > > (please
> > > >>>>>>>> propose the interface in Jira)
> > > >>>>>>>> - we should print out a warning, only once per type, the the
> > > hashcode
> > > >>>>>>>> has been automatically generated based on which fields and
> which
> > > formula
> > > >>>>>>>>
> > > >>>>>>>> D.
> > > >>>>>>>>
> > > >>>>>>>> On Tue, Sep 27, 2016 at 5:42 PM, Denis Magda <
> > dmagda@gridgain.com>
> > > wrote:
> > > >>>>>>>>
> > > >>>>>>>>> Hi Alexander,
> > > >>>>>>>>>
> > > >>>>>>>>> Vladimir’s proposal sounds reasonable to me. However we must
> > > keep in mind
> > > >>>>>>>>> one important thing. Binary objects were designed to address
> > the
> > > following
> > > >>>>>>>>> disadvantages a regular serializer, like optimized
> marshaller,
> > > has:
> > > >>>>>>>>> necessity to deserialize an object on a server side every
> time
> > > it’s needed.
> > > >>>>>>>>> necessity to hold an object in both serialized and
> deserialized
> > > forms on
> > > >>>>>>>>> the server node.
> > > >>>>>>>>> necessity to restart the whole cluster each time an object
> > > version is
> > > >>>>>>>>> changed (new field is added or an old one is removed).
> > > >>>>>>>>> If it will be needed to perform step 3 for a default
> > > implementation of the
> > > >>>>>>>>> binary resolver just because the resolver has to consider new
> > > fields or
> > > >>>>>>>>> ignore old ones then such an implementation sucks. Overall,
> the
> > > default
> > > >>>>>>>>> implementation should use the reflection coming over all the
> > > fields a key
> > > >>>>>>>>> has ignoring the ones that are marked with “transient”
> keyword.
> > > If a user
> > > >>>>>>>>> wants to control the default resolver's logic then he can
> label
> > > all the
> > > >>>>>>>>> fields that mustn’t be of a final has code value with
> > > “transient”. This has
> > > >>>>>>>>> to be well-documented for sure.
> > > >>>>>>>>>
> > > >>>>>>>>> Makes sense?
> > > >>>>>>>>>
> > > >>>>>>>>> —
> > > >>>>>>>>> Denis
> > > >>>>>>>>>
> > > >>>>>>>>>> On Sep 26, 2016, at 12:40 PM, Alexander Paschenko <
> > > >>>>>>>>> alexander.a.paschenko@gmail.com> wrote:
> > > >>>>>>>>>>
> > > >>>>>>>>>> Hello Igniters,
> > > >>>>>>>>>>
> > > >>>>>>>>>> As DML support is near, it's critical that we agree on how
> we
> > > generate
> > > >>>>>>>>>> hash codes for new keys in presence of binary marshaller.
> > > Actually,
> > > >>>>>>>>>> this discussion isn't new - please see its beginning here:
> > > >>>>>>>>>>
> > > >>>>>>>>>> http://apache-ignite-developers.2346864.n4.nabble.
> > > >>>>>>>>> com/All-BinaryObjects-created-by-BinaryObjectBuilder-stored-
> > > >>>>>>>>> at-the-same-partition-by-default-td8042.html
> > > >>>>>>>>>>
> > > >>>>>>>>>> Still, I'm creating this new thread to make getting to the
> > final
> > > >>>>>>>>>> solution as simple and fast as possible.
> > > >>>>>>>>>>
> > > >>>>>>>>>> I remind everyone that the approach that has got the least
> > > critics was
> > > >>>>>>>>>> the one proposed by Vladimir Ozerov:
> > > >>>>>>>>>>
> > > >>>>>>>>>> <quote>
> > > >>>>>>>>>> I think we can do the following:
> > > >>>>>>>>>> 1) Add "has hash code" flag as Denis suggested.
> > > >>>>>>>>>> 2) If object without a hash code is put to cache, throw an
> > > exception.
> > > >>>>>>>>>> 3) Add *BinaryEqualsHashCodeResolver *interface.
> > > >>>>>>>>>> 4) Add default implementation which will auto-generate hash
> > > code. *Print
> > > >>>>>>>>> a
> > > >>>>>>>>>> warning when auto-generation occurs*, so that user is aware
> > > that he is
> > > >>>>>>>>>> likely to have problems with normal GETs/PUTs.
> > > >>>>>>>>>> 5) Add another implementation which will use encoded string
> to
> > > calculate
> > > >>>>>>>>> a
> > > >>>>>>>>>> hash code. E.g. *new BinaryEqualsHashCodeResolver("{a} *
> 31 +
> > > {b}")*.
> > > >>>>>>>>>> Originally proposed by Yakov some time ago.
> > > >>>>>>>>>> </quote>
> > > >>>>>>>>>>
> > > >>>>>>>>>> After that, Sergi suggested that instead of a "formula" we
> > keep
> > > just a
> > > >>>>>>>>>> list of the "fields" that participate in hash code
> evaluation,
> > > and
> > > >>>>>>>>>> with that list, we simply calculate hash code just like IDE
> > > does -
> > > >>>>>>>>>> with all its bit shifts and additions.
> > > >>>>>>>>>>
> > > >>>>>>>>>> I'm planning on settling down with this combined Vlad-Sergi
> > > approach.
> > > >>>>>>>>>> Any objections?
> > > >>>>>>>>>>
> > > >>>>>>>>>> And an extra question I have: Vlad, you suggest that we both
> > > throw an
> > > >>>>>>>>>> exception on cache code absence and that we might generate
> it
> > > as the
> > > >>>>>>>>>> last resort. Do I understand you correctly that you suggest
> > > generating
> > > >>>>>>>>>> random code only in context of SQL, but throw exception for
> > keys
> > > >>>>>>>>>> without codes on ordinary put?
> > > >>>>>>>>>>
> > > >>>>>>>>>> And yes, built-in hash codes for JDK types are supported as
> > > well as
> > > >>>>>>>>>> items 1-2 from Vlad's list (there's already fixed issue of
> > > IGNITE-3633
> > > >>>>>>>>>> for the flag and its presence check).
> > > >>>>>>>>>>
> > > >>>>>>>>>> - Alex
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>
> > > >>>>
> > > >>
> > >
> > >
> >
>

Re: Default hash code generation strategy for new binary objects

Posted by Dmitriy Setrakyan <ds...@apache.org>.

On Thu, Sep 29, 2016 at 9:19 AM, Alexey Goncharuk <
alexey.goncharuk@gmail.com> wrote:

> I want to step back a little bit. Why do we need to choose fields for
> hashCode at all? If all fields participate in equals, all of them should
> participate in hashCode as well. We already serialize a user key in order
> to get a value from cache, so we can use a hashCode based on binary object
> representation.
>

I actually tend to agree with you. If one of the fields in the key is the
affinity field, we could use in in the hashcode as well, then.

If anyone wants to override it, we should provide a hashcode resolver.


>
> 2016-09-29 19:13 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:
>
> > Alexander,
> >
> > How do you plan to annotate fields that participate in the hashcode
> > calculation? Can you add all the changes you plan to make to the
> > configuration in the ticket and post the link here?
> >
> > Also, we must make sure that hashcode fields do not change. I believe you
> > should have validation in the code on startup of a system or of a cache,
> > and throw an exception if it fails.
> >
> > D.
> >
> > On Thu, Sep 29, 2016 at 9:08 AM, Alexander Paschenko <
> > alexander.a.paschenko@gmail.com> wrote:
> >
> > > Thanks everyone!
> > >
> > > Denis, yes, that's what I meant, now we're on the same page :)
> > > However, I'm worried about the same things Alexey is, that is, I'm not
> > > sure how we can handle presence of key fields that don't participate
> > > in 'equals' evaluation. Hence I'm all up for keeping mechanism of
> > > comparison for equality as it is now. And simply enhance the
> > > configuration as follows:
> > >
> > > 1. Introduce new BinaryObjectHashCodeResolver interface with single
> > > method having following signature:
> > >
> > > int computeHashCode(BinaryObject binObj)
> > >
> > > Cache configuration will then get a new corresponding param - probably
> > > a class/class name to instantiate resolver from.
> > >
> > > 2. Introduce default resolver by enhancing SQL engine configuration (=
> > > QueryEntity) - namely, add to it the list of hash code fields and
> > > optional affinity key field.
> > >
> > >
> > >
> > > 2016-09-29 18:39 GMT+03:00 Alexey Goncharuk <
> alexey.goncharuk@gmail.com
> > >:
> > > > Folks, let me point out a few obvious (or not) things
> > > >
> > > > A set of fields participating in hashCode and equals is impossible to
> > > > change without cluster restart. Imagine a new client adding a field F
> > to
> > > > key structure (A, B) so that new key is (A, B, F). In this case key
> (1,
> > > 2,
> > > > 0) will be treated as a different key w/respect to (1, 2, 3), while
> the
> > > new
> > > > client expects them to be the same. I think the set of fields
> defining
> > > the
> > > > key _must_ be used in both hash code and equals calculation and
> cannot
> > > > change over time. Having said that, we can use binary array
> comparison
> > as
> > > > object equality test. Presence of new 'garbage' fields in key looks
> > > useless
> > > > to me.
> > > >
> > > > So, we should support:
> > > > 1) All the things Dmitriy pointed out, in other words - multi-field
> key
> > > > with affinity key fields
> > > > 2) Ability to instantiate the key using a text-only environment
> > > > 3) Ability to define the structure of a key using XML or DDL
> > > >
> > > > I think all of these can be done via combination of HashCodeResolver
> > > > interface + default resolver which can read type configuration.
> > > >
> > > > 2016-09-29 3:54 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:
> > > >
> > > >> Guys,
> > > >>
> > > >> We need to look at 3 cases:
> > > >>
> > > >> a) key is just one field
> > > >> b) key is multiple fields
> > > >> c) key is one or multiple fields, with possibility of an alternate
> > > affinity
> > > >> key
> > > >>
> > > >> For (a) and (b), whenever a type is defined in XML, and further in
> > DML,
> > > a
> > > >> user will specify which fields are part of the key. In that case, we
> > > should
> > > >> just grab those fields and calculate the hashcode automatically.
> > > >>
> > > >> For (c), a user should specify in XML, and further in DML, which
> > fields
> > > >> should be used for affinity, in addition to the key fields. In that
> > > case,
> > > >> we again should grab those fields and calculate the hashcodes for
> the
> > > >> primary key and the affinity key.
> > > >>
> > > >> I really am not sure if there are other ways of doing it. Am I
> missing
> > > >> something?
> > > >>
> > > >> D.
> > > >>
> > > >>
> > > >> On Wed, Sep 28, 2016 at 5:14 PM, Denis Magda <dm...@gridgain.com>
> > > wrote:
> > > >>
> > > >> > Let me show the picture I have in my mind:
> > > >> >
> > > >> > Primary key is a must for all INSERT and MERGE operations. If it’s
> > not
> > > >> set
> > > >> > then an INSERT/MERGE fails.
> > > >> > If a primary key is a boxed/unboxed primitive (int, Integer,
> String,
> > > >> Date,
> > > >> > etc.) then the key value is used for hashCode calculation. At the
> > same
> > > >> time
> > > >> > the key will be an affinity key.
> > > >> > If a primary key is a custom object then it’s value can be passed
> > as a
> > > >> > param directly from Java, .Net, C++ in a way like “INSERT (_key,
> > > field1,
> > > >> > field2) VALUES (?, val1, val2)”. In this scenario we will call
> > > hashCode
> > > >> > directly on the key's value. In addition, we will be able to get
> an
> > > >> > affinity key since we know key’s type class descriptor
> > > >> > (BinaryClassDescriptor).
> > > >> > If a primary key is still a custom key and we want to insert its
> > value
> > > >> > from an SQL console, PHP, Tableu, etc. then we can’t pass the
> key’s
> > > value
> > > >> > as is. Here we’re trying to apply a workaround by listing key's
> > > fields in
> > > >> > INSERT/MERGE and the task is to properly re-construct the key on
> our
> > > side
> > > >> > using only specific fields.
> > > >> >
> > > >> > Is my understanding correct for all the bullets above?
> > > >> >
> > > >> > If so then, yes, I would agree that we need to list these fields
> in
> > a
> > > >> > configuration and the default hash code resolver will use them as
> > > well.
> > > >> > Moreover, we have to pin point an affinity field. So, the question
> > is
> > > >> what
> > > >> > the configuration we should use.
> > > >> >
> > > >> > Community, any other thoughts/ideas?
> > > >> >
> > > >> > —
> > > >> > Denis
> > > >> >
> > > >> > > On Sep 28, 2016, at 4:16 PM, Alexander Paschenko <
> > > >> > alexander.a.paschenko@gmail.com> wrote:
> > > >> > >
> > > >> > > Also MERGE.
> > > >> > >
> > > >> > > 2016-09-29 2:10 GMT+03:00 Denis Magda <dm...@gridgain.com>:
> > > >> > >> You need a hash code only for INSERT operation, right?
> > > >> > >>
> > > >> > >> —
> > > >> > >> Denis
> > > >> > >>
> > > >> > >>> On Sep 28, 2016, at 3:47 PM, Alexander Paschenko <
> > > >> > alexander.a.paschenko@gmail.com> wrote:
> > > >> > >>>
> > > >> > >>> But what if the user works from some kind of console and just
> > > types
> > > >> > >>> the queries as text in full and does not bind params via JDBC
> or
> > > >> > >>> something alike? What if there's no binary object? I don't see
> > > why we
> > > >> > >>> should keep the user from usual cache gets in this case. I
> > really
> > > >> like
> > > >> > >>> the idea of supplying the values of distinct fields, thus
> > freeing
> > > the
> > > >> > >>> user of the need to mess with objects and builders, AND then
> > just
> > > >> > >>> calculating hash code as suggested before - say, via
> explicitly
> > > >> > >>> listing participating fields in XML or by marking them with
> > > transient
> > > >> > >>> keyword or some annotation.
> > > >> > >>> Actually, I believe that's the only case when we need to
> > generate
> > > any
> > > >> > >>> hash codes - when the class is present, we can just get hash
> > code
> > > >> from
> > > >> > >>> its implementation of its method. When there's no class, we
> > > generate.
> > > >> > >>> And all that is solely for SQL. For the rest - just throw an
> > > >> exception
> > > >> > >>> when there's no hash code manually set for binary object. I
> > don't
> > > see
> > > >> > >>> why we should try to generate anything when the user already
> is
> > > using
> > > >> > >>> Ignite in full, not just via limited interface of SQL.
> > > >> > >>>
> > > >> > >>> 2016-09-29 0:31 GMT+03:00 Denis Magda <dm...@gridgain.com>:
> > > >> > >>>> Hmm, this is a good question.
> > > >> > >>>>
> > > >> > >>>> If a user doesn’t provide a _key when an INSERT is executed
> for
> > > me
> > > >> it
> > > >> > means that he is not going to use the key later for cache.get/put,
> > > >> DELETE,
> > > >> > UPDATE and other possible operation simply because he doesn’t know
> > > how to
> > > >> > reconstruct the key back in his code. If he wants to use the
> primary
> > > key
> > > >> in
> > > >> > the rest of operations then he must provide it at INSERT time.
> > > >> > >>>>
> > > >> > >>>> Do we need this key only for a case when an object is being
> > > inserted
> > > >> > into a cache? If it’s so I would auto-generate a key using ‘long’
> > as a
> > > >> key
> > > >> > type. I do remember that we provided the auto-generation for Spark
> > > module
> > > >> > in a some way that may be useful here.
> > > >> > >>>>
> > > >> > >>>> —
> > > >> > >>>> Denis
> > > >> > >>>>
> > > >> > >>>>> On Sep 28, 2016, at 9:53 AM, Alexander Paschenko <
> > > >> > alexander.a.paschenko@gmail.com> wrote:
> > > >> > >>>>>
> > > >> > >>>>> Denis,
> > > >> > >>>>>
> > > >> > >>>>> That's not what I was asking about.
> > > >> > >>>>> Currently DML implementation allows for dymanic
> instantiation
> > of
> > > >> > keys,
> > > >> > >>>>> in other words, user does not have to provide value for
> > > >> object-typed
> > > >> > >>>>> _key column - instead, he may supply just field values based
> > on
> > > >> which
> > > >> > >>>>> _key will be dynamically instantiated/binary built. And
> that's
> > > the
> > > >> > >>>>> whole point of this discussion as I see it: what to do when
> > > we've
> > > >> > >>>>> binary built classless key that we build ourselves inside
> SQL
> > > >> engine
> > > >> > >>>>> and don't know how to compute hash code for it?
> > > >> > >>>>>
> > > >> > >>>>> - Alex
> > > >> > >>>>>
> > > >> > >>>>> 2016-09-28 19:48 GMT+03:00 Denis Magda <dmagda@gridgain.com
> >:
> > > >> > >>>>>> Alexander,
> > > >> > >>>>>>
> > > >> > >>>>>> As I guess if we have a key without a class then it will be
> > > >> > constructed using a BinaryBuilder instance and it’s user
> > > responsibility
> > > >> to
> > > >> > set the has code at the end with BinaryBuilder.hasCode method.
> Sure,
> > > all
> > > >> > this cases must be well-documented in both Java Doc API and Apache
> > > Ignite
> > > >> > documentation.
> > > >> > >>>>>>
> > > >> > >>>>>> —
> > > >> > >>>>>> Denis
> > > >> > >>>>>>
> > > >> > >>>>>>> On Sep 28, 2016, at 9:33 AM, Alexander Paschenko <
> > > >> > alexander.a.paschenko@gmail.com> wrote:
> > > >> > >>>>>>>
> > > >> > >>>>>>> Dmitry, Denis,
> > > >> > >>>>>>>
> > > >> > >>>>>>> OK, but I think it's necessary to address also the cases
> > when
> > > >> > there's
> > > >> > >>>>>>> no actual class for the key, and its fields are simply
> > > declared
> > > >> in
> > > >> > >>>>>>> XML. In this case, there are no fields to be marked
> > transient.
> > > >> > What do
> > > >> > >>>>>>> we do then? List transient fields in XML separately?
> > > >> > >>>>>>>
> > > >> > >>>>>>> - Alex
> > > >> > >>>>>>>
> > > >> > >>>>>>> 2016-09-28 4:15 GMT+03:00 Dmitriy Setrakyan <
> > > >> dsetrakyan@apache.org
> > > >> > >:
> > > >> > >>>>>>>> Agree with Denis.
> > > >> > >>>>>>>>
> > > >> > >>>>>>>> - by default, all non-transient key fields should
> > > participate in
> > > >> > the
> > > >> > >>>>>>>> hashcode generation
> > > >> > >>>>>>>> - when working on DDL, then the primary key fields should
> > > >> > participate in
> > > >> > >>>>>>>> the hashcode
> > > >> > >>>>>>>> - we should add a resolver to override the default
> behavior
> > > >> > (please
> > > >> > >>>>>>>> propose the interface in Jira)
> > > >> > >>>>>>>> - we should print out a warning, only once per type, the
> > the
> > > >> > hashcode
> > > >> > >>>>>>>> has been automatically generated based on which fields
> and
> > > which
> > > >> > formula
> > > >> > >>>>>>>>
> > > >> > >>>>>>>> D.
> > > >> > >>>>>>>>
> > > >> > >>>>>>>> On Tue, Sep 27, 2016 at 5:42 PM, Denis Magda <
> > > >> dmagda@gridgain.com>
> > > >> > wrote:
> > > >> > >>>>>>>>
> > > >> > >>>>>>>>> Hi Alexander,
> > > >> > >>>>>>>>>
> > > >> > >>>>>>>>> Vladimir’s proposal sounds reasonable to me. However we
> > must
> > > >> > keep in mind
> > > >> > >>>>>>>>> one important thing. Binary objects were designed to
> > address
> > > >> the
> > > >> > following
> > > >> > >>>>>>>>> disadvantages a regular serializer, like optimized
> > > marshaller,
> > > >> > has:
> > > >> > >>>>>>>>> necessity to deserialize an object on a server side
> every
> > > time
> > > >> > it’s needed.
> > > >> > >>>>>>>>> necessity to hold an object in both serialized and
> > > deserialized
> > > >> > forms on
> > > >> > >>>>>>>>> the server node.
> > > >> > >>>>>>>>> necessity to restart the whole cluster each time an
> object
> > > >> > version is
> > > >> > >>>>>>>>> changed (new field is added or an old one is removed).
> > > >> > >>>>>>>>> If it will be needed to perform step 3 for a default
> > > >> > implementation of the
> > > >> > >>>>>>>>> binary resolver just because the resolver has to
> consider
> > > new
> > > >> > fields or
> > > >> > >>>>>>>>> ignore old ones then such an implementation sucks.
> > Overall,
> > > the
> > > >> > default
> > > >> > >>>>>>>>> implementation should use the reflection coming over all
> > the
> > > >> > fields a key
> > > >> > >>>>>>>>> has ignoring the ones that are marked with “transient”
> > > keyword.
> > > >> > If a user
> > > >> > >>>>>>>>> wants to control the default resolver's logic then he
> can
> > > label
> > > >> > all the
> > > >> > >>>>>>>>> fields that mustn’t be of a final has code value with
> > > >> > “transient”. This has
> > > >> > >>>>>>>>> to be well-documented for sure.
> > > >> > >>>>>>>>>
> > > >> > >>>>>>>>> Makes sense?
> > > >> > >>>>>>>>>
> > > >> > >>>>>>>>> —
> > > >> > >>>>>>>>> Denis
> > > >> > >>>>>>>>>
> > > >> > >>>>>>>>>> On Sep 26, 2016, at 12:40 PM, Alexander Paschenko <
> > > >> > >>>>>>>>> alexander.a.paschenko@gmail.com> wrote:
> > > >> > >>>>>>>>>>
> > > >> > >>>>>>>>>> Hello Igniters,
> > > >> > >>>>>>>>>>
> > > >> > >>>>>>>>>> As DML support is near, it's critical that we agree on
> > how
> > > we
> > > >> > generate
> > > >> > >>>>>>>>>> hash codes for new keys in presence of binary
> marshaller.
> > > >> > Actually,
> > > >> > >>>>>>>>>> this discussion isn't new - please see its beginning
> > here:
> > > >> > >>>>>>>>>>
> > > >> > >>>>>>>>>> http://apache-ignite-developers.2346864.n4.nabble.
> > > >> > >>>>>>>>> com/All-BinaryObjects-created-
> > > by-BinaryObjectBuilder-stored-
> > > >> > >>>>>>>>> at-the-same-partition-by-default-td8042.html
> > > >> > >>>>>>>>>>
> > > >> > >>>>>>>>>> Still, I'm creating this new thread to make getting to
> > the
> > > >> final
> > > >> > >>>>>>>>>> solution as simple and fast as possible.
> > > >> > >>>>>>>>>>
> > > >> > >>>>>>>>>> I remind everyone that the approach that has got the
> > least
> > > >> > critics was
> > > >> > >>>>>>>>>> the one proposed by Vladimir Ozerov:
> > > >> > >>>>>>>>>>
> > > >> > >>>>>>>>>> <quote>
> > > >> > >>>>>>>>>> I think we can do the following:
> > > >> > >>>>>>>>>> 1) Add "has hash code" flag as Denis suggested.
> > > >> > >>>>>>>>>> 2) If object without a hash code is put to cache, throw
> > an
> > > >> > exception.
> > > >> > >>>>>>>>>> 3) Add *BinaryEqualsHashCodeResolver *interface.
> > > >> > >>>>>>>>>> 4) Add default implementation which will auto-generate
> > hash
> > > >> > code. *Print
> > > >> > >>>>>>>>> a
> > > >> > >>>>>>>>>> warning when auto-generation occurs*, so that user is
> > aware
> > > >> > that he is
> > > >> > >>>>>>>>>> likely to have problems with normal GETs/PUTs.
> > > >> > >>>>>>>>>> 5) Add another implementation which will use encoded
> > > string to
> > > >> > calculate
> > > >> > >>>>>>>>> a
> > > >> > >>>>>>>>>> hash code. E.g. *new BinaryEqualsHashCodeResolver("{a}
> *
> > > 31 +
> > > >> > {b}")*.
> > > >> > >>>>>>>>>> Originally proposed by Yakov some time ago.
> > > >> > >>>>>>>>>> </quote>
> > > >> > >>>>>>>>>>
> > > >> > >>>>>>>>>> After that, Sergi suggested that instead of a "formula"
> > we
> > > >> keep
> > > >> > just a
> > > >> > >>>>>>>>>> list of the "fields" that participate in hash code
> > > evaluation,
> > > >> > and
> > > >> > >>>>>>>>>> with that list, we simply calculate hash code just like
> > IDE
> > > >> > does -
> > > >> > >>>>>>>>>> with all its bit shifts and additions.
> > > >> > >>>>>>>>>>
> > > >> > >>>>>>>>>> I'm planning on settling down with this combined
> > Vlad-Sergi
> > > >> > approach.
> > > >> > >>>>>>>>>> Any objections?
> > > >> > >>>>>>>>>>
> > > >> > >>>>>>>>>> And an extra question I have: Vlad, you suggest that we
> > > both
> > > >> > throw an
> > > >> > >>>>>>>>>> exception on cache code absence and that we might
> > generate
> > > it
> > > >> > as the
> > > >> > >>>>>>>>>> last resort. Do I understand you correctly that you
> > suggest
> > > >> > generating
> > > >> > >>>>>>>>>> random code only in context of SQL, but throw exception
> > for
> > > >> keys
> > > >> > >>>>>>>>>> without codes on ordinary put?
> > > >> > >>>>>>>>>>
> > > >> > >>>>>>>>>> And yes, built-in hash codes for JDK types are
> supported
> > as
> > > >> > well as
> > > >> > >>>>>>>>>> items 1-2 from Vlad's list (there's already fixed issue
> > of
> > > >> > IGNITE-3633
> > > >> > >>>>>>>>>> for the flag and its presence check).
> > > >> > >>>>>>>>>>
> > > >> > >>>>>>>>>> - Alex
> > > >> > >>>>>>>>>
> > > >> > >>>>>>>>>
> > > >> > >>>>>>
> > > >> > >>>>
> > > >> > >>
> > > >> >
> > > >> >
> > > >>
> > >
> >
>

Re: Default hash code generation strategy for new binary objects

Posted by Alexander Paschenko <al...@gmail.com>.

Dmitry,

Just list them in XML config, in the section about indexed types. Will
do proposal for that on issue page later today.

Regarding validation of the field list - currently there's no (and
hardly can be) protection against miscalculation of hash codes passed
to the BinaryObjectBuilder. It's user's responsibility to maintain
integrity of their data and avoid conflicts in versions of the
settings. Java's hash map does not protect the user from its misuse -
you can change inner state of a key that you have put to the map in
such a way that you won't ever be able to retrieve it again, and map
itself can't protect the user from such cases. The user should know
what they are doing and understand that choosing hash code fields
wisely is the key to making the cache work correctly.

2016-09-29 19:19 GMT+03:00 Alexey Goncharuk <al...@gmail.com>:
> I want to step back a little bit. Why do we need to choose fields for
> hashCode at all? If all fields participate in equals, all of them should
> participate in hashCode as well. We already serialize a user key in order
> to get a value from cache, so we can use a hashCode based on binary object
> representation.
>
> 2016-09-29 19:13 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:
>
>> Alexander,
>>
>> How do you plan to annotate fields that participate in the hashcode
>> calculation? Can you add all the changes you plan to make to the
>> configuration in the ticket and post the link here?
>>
>> Also, we must make sure that hashcode fields do not change. I believe you
>> should have validation in the code on startup of a system or of a cache,
>> and throw an exception if it fails.
>>
>> D.
>>
>> On Thu, Sep 29, 2016 at 9:08 AM, Alexander Paschenko <
>> alexander.a.paschenko@gmail.com> wrote:
>>
>> > Thanks everyone!
>> >
>> > Denis, yes, that's what I meant, now we're on the same page :)
>> > However, I'm worried about the same things Alexey is, that is, I'm not
>> > sure how we can handle presence of key fields that don't participate
>> > in 'equals' evaluation. Hence I'm all up for keeping mechanism of
>> > comparison for equality as it is now. And simply enhance the
>> > configuration as follows:
>> >
>> > 1. Introduce new BinaryObjectHashCodeResolver interface with single
>> > method having following signature:
>> >
>> > int computeHashCode(BinaryObject binObj)
>> >
>> > Cache configuration will then get a new corresponding param - probably
>> > a class/class name to instantiate resolver from.
>> >
>> > 2. Introduce default resolver by enhancing SQL engine configuration (=
>> > QueryEntity) - namely, add to it the list of hash code fields and
>> > optional affinity key field.
>> >
>> >
>> >
>> > 2016-09-29 18:39 GMT+03:00 Alexey Goncharuk <alexey.goncharuk@gmail.com
>> >:
>> > > Folks, let me point out a few obvious (or not) things
>> > >
>> > > A set of fields participating in hashCode and equals is impossible to
>> > > change without cluster restart. Imagine a new client adding a field F
>> to
>> > > key structure (A, B) so that new key is (A, B, F). In this case key (1,
>> > 2,
>> > > 0) will be treated as a different key w/respect to (1, 2, 3), while the
>> > new
>> > > client expects them to be the same. I think the set of fields defining
>> > the
>> > > key _must_ be used in both hash code and equals calculation and cannot
>> > > change over time. Having said that, we can use binary array comparison
>> as
>> > > object equality test. Presence of new 'garbage' fields in key looks
>> > useless
>> > > to me.
>> > >
>> > > So, we should support:
>> > > 1) All the things Dmitriy pointed out, in other words - multi-field key
>> > > with affinity key fields
>> > > 2) Ability to instantiate the key using a text-only environment
>> > > 3) Ability to define the structure of a key using XML or DDL
>> > >
>> > > I think all of these can be done via combination of HashCodeResolver
>> > > interface + default resolver which can read type configuration.
>> > >
>> > > 2016-09-29 3:54 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:
>> > >
>> > >> Guys,
>> > >>
>> > >> We need to look at 3 cases:
>> > >>
>> > >> a) key is just one field
>> > >> b) key is multiple fields
>> > >> c) key is one or multiple fields, with possibility of an alternate
>> > affinity
>> > >> key
>> > >>
>> > >> For (a) and (b), whenever a type is defined in XML, and further in
>> DML,
>> > a
>> > >> user will specify which fields are part of the key. In that case, we
>> > should
>> > >> just grab those fields and calculate the hashcode automatically.
>> > >>
>> > >> For (c), a user should specify in XML, and further in DML, which
>> fields
>> > >> should be used for affinity, in addition to the key fields. In that
>> > case,
>> > >> we again should grab those fields and calculate the hashcodes for the
>> > >> primary key and the affinity key.
>> > >>
>> > >> I really am not sure if there are other ways of doing it. Am I missing
>> > >> something?
>> > >>
>> > >> D.
>> > >>
>> > >>
>> > >> On Wed, Sep 28, 2016 at 5:14 PM, Denis Magda <dm...@gridgain.com>
>> > wrote:
>> > >>
>> > >> > Let me show the picture I have in my mind:
>> > >> >
>> > >> > Primary key is a must for all INSERT and MERGE operations. If it’s
>> not
>> > >> set
>> > >> > then an INSERT/MERGE fails.
>> > >> > If a primary key is a boxed/unboxed primitive (int, Integer, String,
>> > >> Date,
>> > >> > etc.) then the key value is used for hashCode calculation. At the
>> same
>> > >> time
>> > >> > the key will be an affinity key.
>> > >> > If a primary key is a custom object then it’s value can be passed
>> as a
>> > >> > param directly from Java, .Net, C++ in a way like “INSERT (_key,
>> > field1,
>> > >> > field2) VALUES (?, val1, val2)”. In this scenario we will call
>> > hashCode
>> > >> > directly on the key's value. In addition, we will be able to get an
>> > >> > affinity key since we know key’s type class descriptor
>> > >> > (BinaryClassDescriptor).
>> > >> > If a primary key is still a custom key and we want to insert its
>> value
>> > >> > from an SQL console, PHP, Tableu, etc. then we can’t pass the key’s
>> > value
>> > >> > as is. Here we’re trying to apply a workaround by listing key's
>> > fields in
>> > >> > INSERT/MERGE and the task is to properly re-construct the key on our
>> > side
>> > >> > using only specific fields.
>> > >> >
>> > >> > Is my understanding correct for all the bullets above?
>> > >> >
>> > >> > If so then, yes, I would agree that we need to list these fields in
>> a
>> > >> > configuration and the default hash code resolver will use them as
>> > well.
>> > >> > Moreover, we have to pin point an affinity field. So, the question
>> is
>> > >> what
>> > >> > the configuration we should use.
>> > >> >
>> > >> > Community, any other thoughts/ideas?
>> > >> >
>> > >> > —
>> > >> > Denis
>> > >> >
>> > >> > > On Sep 28, 2016, at 4:16 PM, Alexander Paschenko <
>> > >> > alexander.a.paschenko@gmail.com> wrote:
>> > >> > >
>> > >> > > Also MERGE.
>> > >> > >
>> > >> > > 2016-09-29 2:10 GMT+03:00 Denis Magda <dm...@gridgain.com>:
>> > >> > >> You need a hash code only for INSERT operation, right?
>> > >> > >>
>> > >> > >> —
>> > >> > >> Denis
>> > >> > >>
>> > >> > >>> On Sep 28, 2016, at 3:47 PM, Alexander Paschenko <
>> > >> > alexander.a.paschenko@gmail.com> wrote:
>> > >> > >>>
>> > >> > >>> But what if the user works from some kind of console and just
>> > types
>> > >> > >>> the queries as text in full and does not bind params via JDBC or
>> > >> > >>> something alike? What if there's no binary object? I don't see
>> > why we
>> > >> > >>> should keep the user from usual cache gets in this case. I
>> really
>> > >> like
>> > >> > >>> the idea of supplying the values of distinct fields, thus
>> freeing
>> > the
>> > >> > >>> user of the need to mess with objects and builders, AND then
>> just
>> > >> > >>> calculating hash code as suggested before - say, via explicitly
>> > >> > >>> listing participating fields in XML or by marking them with
>> > transient
>> > >> > >>> keyword or some annotation.
>> > >> > >>> Actually, I believe that's the only case when we need to
>> generate
>> > any
>> > >> > >>> hash codes - when the class is present, we can just get hash
>> code
>> > >> from
>> > >> > >>> its implementation of its method. When there's no class, we
>> > generate.
>> > >> > >>> And all that is solely for SQL. For the rest - just throw an
>> > >> exception
>> > >> > >>> when there's no hash code manually set for binary object. I
>> don't
>> > see
>> > >> > >>> why we should try to generate anything when the user already is
>> > using
>> > >> > >>> Ignite in full, not just via limited interface of SQL.
>> > >> > >>>
>> > >> > >>> 2016-09-29 0:31 GMT+03:00 Denis Magda <dm...@gridgain.com>:
>> > >> > >>>> Hmm, this is a good question.
>> > >> > >>>>
>> > >> > >>>> If a user doesn’t provide a _key when an INSERT is executed for
>> > me
>> > >> it
>> > >> > means that he is not going to use the key later for cache.get/put,
>> > >> DELETE,
>> > >> > UPDATE and other possible operation simply because he doesn’t know
>> > how to
>> > >> > reconstruct the key back in his code. If he wants to use the primary
>> > key
>> > >> in
>> > >> > the rest of operations then he must provide it at INSERT time.
>> > >> > >>>>
>> > >> > >>>> Do we need this key only for a case when an object is being
>> > inserted
>> > >> > into a cache? If it’s so I would auto-generate a key using ‘long’
>> as a
>> > >> key
>> > >> > type. I do remember that we provided the auto-generation for Spark
>> > module
>> > >> > in a some way that may be useful here.
>> > >> > >>>>
>> > >> > >>>> —
>> > >> > >>>> Denis
>> > >> > >>>>
>> > >> > >>>>> On Sep 28, 2016, at 9:53 AM, Alexander Paschenko <
>> > >> > alexander.a.paschenko@gmail.com> wrote:
>> > >> > >>>>>
>> > >> > >>>>> Denis,
>> > >> > >>>>>
>> > >> > >>>>> That's not what I was asking about.
>> > >> > >>>>> Currently DML implementation allows for dymanic instantiation
>> of
>> > >> > keys,
>> > >> > >>>>> in other words, user does not have to provide value for
>> > >> object-typed
>> > >> > >>>>> _key column - instead, he may supply just field values based
>> on
>> > >> which
>> > >> > >>>>> _key will be dynamically instantiated/binary built. And that's
>> > the
>> > >> > >>>>> whole point of this discussion as I see it: what to do when
>> > we've
>> > >> > >>>>> binary built classless key that we build ourselves inside SQL
>> > >> engine
>> > >> > >>>>> and don't know how to compute hash code for it?
>> > >> > >>>>>
>> > >> > >>>>> - Alex
>> > >> > >>>>>
>> > >> > >>>>> 2016-09-28 19:48 GMT+03:00 Denis Magda <dm...@gridgain.com>:
>> > >> > >>>>>> Alexander,
>> > >> > >>>>>>
>> > >> > >>>>>> As I guess if we have a key without a class then it will be
>> > >> > constructed using a BinaryBuilder instance and it’s user
>> > responsibility
>> > >> to
>> > >> > set the has code at the end with BinaryBuilder.hasCode method. Sure,
>> > all
>> > >> > this cases must be well-documented in both Java Doc API and Apache
>> > Ignite
>> > >> > documentation.
>> > >> > >>>>>>
>> > >> > >>>>>> —
>> > >> > >>>>>> Denis
>> > >> > >>>>>>
>> > >> > >>>>>>> On Sep 28, 2016, at 9:33 AM, Alexander Paschenko <
>> > >> > alexander.a.paschenko@gmail.com> wrote:
>> > >> > >>>>>>>
>> > >> > >>>>>>> Dmitry, Denis,
>> > >> > >>>>>>>
>> > >> > >>>>>>> OK, but I think it's necessary to address also the cases
>> when
>> > >> > there's
>> > >> > >>>>>>> no actual class for the key, and its fields are simply
>> > declared
>> > >> in
>> > >> > >>>>>>> XML. In this case, there are no fields to be marked
>> transient.
>> > >> > What do
>> > >> > >>>>>>> we do then? List transient fields in XML separately?
>> > >> > >>>>>>>
>> > >> > >>>>>>> - Alex
>> > >> > >>>>>>>
>> > >> > >>>>>>> 2016-09-28 4:15 GMT+03:00 Dmitriy Setrakyan <
>> > >> dsetrakyan@apache.org
>> > >> > >:
>> > >> > >>>>>>>> Agree with Denis.
>> > >> > >>>>>>>>
>> > >> > >>>>>>>> - by default, all non-transient key fields should
>> > participate in
>> > >> > the
>> > >> > >>>>>>>> hashcode generation
>> > >> > >>>>>>>> - when working on DDL, then the primary key fields should
>> > >> > participate in
>> > >> > >>>>>>>> the hashcode
>> > >> > >>>>>>>> - we should add a resolver to override the default behavior
>> > >> > (please
>> > >> > >>>>>>>> propose the interface in Jira)
>> > >> > >>>>>>>> - we should print out a warning, only once per type, the
>> the
>> > >> > hashcode
>> > >> > >>>>>>>> has been automatically generated based on which fields and
>> > which
>> > >> > formula
>> > >> > >>>>>>>>
>> > >> > >>>>>>>> D.
>> > >> > >>>>>>>>
>> > >> > >>>>>>>> On Tue, Sep 27, 2016 at 5:42 PM, Denis Magda <
>> > >> dmagda@gridgain.com>
>> > >> > wrote:
>> > >> > >>>>>>>>
>> > >> > >>>>>>>>> Hi Alexander,
>> > >> > >>>>>>>>>
>> > >> > >>>>>>>>> Vladimir’s proposal sounds reasonable to me. However we
>> must
>> > >> > keep in mind
>> > >> > >>>>>>>>> one important thing. Binary objects were designed to
>> address
>> > >> the
>> > >> > following
>> > >> > >>>>>>>>> disadvantages a regular serializer, like optimized
>> > marshaller,
>> > >> > has:
>> > >> > >>>>>>>>> necessity to deserialize an object on a server side every
>> > time
>> > >> > it’s needed.
>> > >> > >>>>>>>>> necessity to hold an object in both serialized and
>> > deserialized
>> > >> > forms on
>> > >> > >>>>>>>>> the server node.
>> > >> > >>>>>>>>> necessity to restart the whole cluster each time an object
>> > >> > version is
>> > >> > >>>>>>>>> changed (new field is added or an old one is removed).
>> > >> > >>>>>>>>> If it will be needed to perform step 3 for a default
>> > >> > implementation of the
>> > >> > >>>>>>>>> binary resolver just because the resolver has to consider
>> > new
>> > >> > fields or
>> > >> > >>>>>>>>> ignore old ones then such an implementation sucks.
>> Overall,
>> > the
>> > >> > default
>> > >> > >>>>>>>>> implementation should use the reflection coming over all
>> the
>> > >> > fields a key
>> > >> > >>>>>>>>> has ignoring the ones that are marked with “transient”
>> > keyword.
>> > >> > If a user
>> > >> > >>>>>>>>> wants to control the default resolver's logic then he can
>> > label
>> > >> > all the
>> > >> > >>>>>>>>> fields that mustn’t be of a final has code value with
>> > >> > “transient”. This has
>> > >> > >>>>>>>>> to be well-documented for sure.
>> > >> > >>>>>>>>>
>> > >> > >>>>>>>>> Makes sense?
>> > >> > >>>>>>>>>
>> > >> > >>>>>>>>> —
>> > >> > >>>>>>>>> Denis
>> > >> > >>>>>>>>>
>> > >> > >>>>>>>>>> On Sep 26, 2016, at 12:40 PM, Alexander Paschenko <
>> > >> > >>>>>>>>> alexander.a.paschenko@gmail.com> wrote:
>> > >> > >>>>>>>>>>
>> > >> > >>>>>>>>>> Hello Igniters,
>> > >> > >>>>>>>>>>
>> > >> > >>>>>>>>>> As DML support is near, it's critical that we agree on
>> how
>> > we
>> > >> > generate
>> > >> > >>>>>>>>>> hash codes for new keys in presence of binary marshaller.
>> > >> > Actually,
>> > >> > >>>>>>>>>> this discussion isn't new - please see its beginning
>> here:
>> > >> > >>>>>>>>>>
>> > >> > >>>>>>>>>> http://apache-ignite-developers.2346864.n4.nabble.
>> > >> > >>>>>>>>> com/All-BinaryObjects-created-
>> > by-BinaryObjectBuilder-stored-
>> > >> > >>>>>>>>> at-the-same-partition-by-default-td8042.html
>> > >> > >>>>>>>>>>
>> > >> > >>>>>>>>>> Still, I'm creating this new thread to make getting to
>> the
>> > >> final
>> > >> > >>>>>>>>>> solution as simple and fast as possible.
>> > >> > >>>>>>>>>>
>> > >> > >>>>>>>>>> I remind everyone that the approach that has got the
>> least
>> > >> > critics was
>> > >> > >>>>>>>>>> the one proposed by Vladimir Ozerov:
>> > >> > >>>>>>>>>>
>> > >> > >>>>>>>>>> <quote>
>> > >> > >>>>>>>>>> I think we can do the following:
>> > >> > >>>>>>>>>> 1) Add "has hash code" flag as Denis suggested.
>> > >> > >>>>>>>>>> 2) If object without a hash code is put to cache, throw
>> an
>> > >> > exception.
>> > >> > >>>>>>>>>> 3) Add *BinaryEqualsHashCodeResolver *interface.
>> > >> > >>>>>>>>>> 4) Add default implementation which will auto-generate
>> hash
>> > >> > code. *Print
>> > >> > >>>>>>>>> a
>> > >> > >>>>>>>>>> warning when auto-generation occurs*, so that user is
>> aware
>> > >> > that he is
>> > >> > >>>>>>>>>> likely to have problems with normal GETs/PUTs.
>> > >> > >>>>>>>>>> 5) Add another implementation which will use encoded
>> > string to
>> > >> > calculate
>> > >> > >>>>>>>>> a
>> > >> > >>>>>>>>>> hash code. E.g. *new BinaryEqualsHashCodeResolver("{a} *
>> > 31 +
>> > >> > {b}")*.
>> > >> > >>>>>>>>>> Originally proposed by Yakov some time ago.
>> > >> > >>>>>>>>>> </quote>
>> > >> > >>>>>>>>>>
>> > >> > >>>>>>>>>> After that, Sergi suggested that instead of a "formula"
>> we
>> > >> keep
>> > >> > just a
>> > >> > >>>>>>>>>> list of the "fields" that participate in hash code
>> > evaluation,
>> > >> > and
>> > >> > >>>>>>>>>> with that list, we simply calculate hash code just like
>> IDE
>> > >> > does -
>> > >> > >>>>>>>>>> with all its bit shifts and additions.
>> > >> > >>>>>>>>>>
>> > >> > >>>>>>>>>> I'm planning on settling down with this combined
>> Vlad-Sergi
>> > >> > approach.
>> > >> > >>>>>>>>>> Any objections?
>> > >> > >>>>>>>>>>
>> > >> > >>>>>>>>>> And an extra question I have: Vlad, you suggest that we
>> > both
>> > >> > throw an
>> > >> > >>>>>>>>>> exception on cache code absence and that we might
>> generate
>> > it
>> > >> > as the
>> > >> > >>>>>>>>>> last resort. Do I understand you correctly that you
>> suggest
>> > >> > generating
>> > >> > >>>>>>>>>> random code only in context of SQL, but throw exception
>> for
>> > >> keys
>> > >> > >>>>>>>>>> without codes on ordinary put?
>> > >> > >>>>>>>>>>
>> > >> > >>>>>>>>>> And yes, built-in hash codes for JDK types are supported
>> as
>> > >> > well as
>> > >> > >>>>>>>>>> items 1-2 from Vlad's list (there's already fixed issue
>> of
>> > >> > IGNITE-3633
>> > >> > >>>>>>>>>> for the flag and its presence check).
>> > >> > >>>>>>>>>>
>> > >> > >>>>>>>>>> - Alex
>> > >> > >>>>>>>>>
>> > >> > >>>>>>>>>
>> > >> > >>>>>>
>> > >> > >>>>
>> > >> > >>
>> > >> >
>> > >> >
>> > >>
>> >
>>

Re: Default hash code generation strategy for new binary objects

Posted by Alexey Goncharuk <al...@gmail.com>.

I want to step back a little bit. Why do we need to choose fields for
hashCode at all? If all fields participate in equals, all of them should
participate in hashCode as well. We already serialize a user key in order
to get a value from cache, so we can use a hashCode based on binary object
representation.

2016-09-29 19:13 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:

> Alexander,
>
> How do you plan to annotate fields that participate in the hashcode
> calculation? Can you add all the changes you plan to make to the
> configuration in the ticket and post the link here?
>
> Also, we must make sure that hashcode fields do not change. I believe you
> should have validation in the code on startup of a system or of a cache,
> and throw an exception if it fails.
>
> D.
>
> On Thu, Sep 29, 2016 at 9:08 AM, Alexander Paschenko <
> alexander.a.paschenko@gmail.com> wrote:
>
> > Thanks everyone!
> >
> > Denis, yes, that's what I meant, now we're on the same page :)
> > However, I'm worried about the same things Alexey is, that is, I'm not
> > sure how we can handle presence of key fields that don't participate
> > in 'equals' evaluation. Hence I'm all up for keeping mechanism of
> > comparison for equality as it is now. And simply enhance the
> > configuration as follows:
> >
> > 1. Introduce new BinaryObjectHashCodeResolver interface with single
> > method having following signature:
> >
> > int computeHashCode(BinaryObject binObj)
> >
> > Cache configuration will then get a new corresponding param - probably
> > a class/class name to instantiate resolver from.
> >
> > 2. Introduce default resolver by enhancing SQL engine configuration (=
> > QueryEntity) - namely, add to it the list of hash code fields and
> > optional affinity key field.
> >
> >
> >
> > 2016-09-29 18:39 GMT+03:00 Alexey Goncharuk <alexey.goncharuk@gmail.com
> >:
> > > Folks, let me point out a few obvious (or not) things
> > >
> > > A set of fields participating in hashCode and equals is impossible to
> > > change without cluster restart. Imagine a new client adding a field F
> to
> > > key structure (A, B) so that new key is (A, B, F). In this case key (1,
> > 2,
> > > 0) will be treated as a different key w/respect to (1, 2, 3), while the
> > new
> > > client expects them to be the same. I think the set of fields defining
> > the
> > > key _must_ be used in both hash code and equals calculation and cannot
> > > change over time. Having said that, we can use binary array comparison
> as
> > > object equality test. Presence of new 'garbage' fields in key looks
> > useless
> > > to me.
> > >
> > > So, we should support:
> > > 1) All the things Dmitriy pointed out, in other words - multi-field key
> > > with affinity key fields
> > > 2) Ability to instantiate the key using a text-only environment
> > > 3) Ability to define the structure of a key using XML or DDL
> > >
> > > I think all of these can be done via combination of HashCodeResolver
> > > interface + default resolver which can read type configuration.
> > >
> > > 2016-09-29 3:54 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:
> > >
> > >> Guys,
> > >>
> > >> We need to look at 3 cases:
> > >>
> > >> a) key is just one field
> > >> b) key is multiple fields
> > >> c) key is one or multiple fields, with possibility of an alternate
> > affinity
> > >> key
> > >>
> > >> For (a) and (b), whenever a type is defined in XML, and further in
> DML,
> > a
> > >> user will specify which fields are part of the key. In that case, we
> > should
> > >> just grab those fields and calculate the hashcode automatically.
> > >>
> > >> For (c), a user should specify in XML, and further in DML, which
> fields
> > >> should be used for affinity, in addition to the key fields. In that
> > case,
> > >> we again should grab those fields and calculate the hashcodes for the
> > >> primary key and the affinity key.
> > >>
> > >> I really am not sure if there are other ways of doing it. Am I missing
> > >> something?
> > >>
> > >> D.
> > >>
> > >>
> > >> On Wed, Sep 28, 2016 at 5:14 PM, Denis Magda <dm...@gridgain.com>
> > wrote:
> > >>
> > >> > Let me show the picture I have in my mind:
> > >> >
> > >> > Primary key is a must for all INSERT and MERGE operations. If it’s
> not
> > >> set
> > >> > then an INSERT/MERGE fails.
> > >> > If a primary key is a boxed/unboxed primitive (int, Integer, String,
> > >> Date,
> > >> > etc.) then the key value is used for hashCode calculation. At the
> same
> > >> time
> > >> > the key will be an affinity key.
> > >> > If a primary key is a custom object then it’s value can be passed
> as a
> > >> > param directly from Java, .Net, C++ in a way like “INSERT (_key,
> > field1,
> > >> > field2) VALUES (?, val1, val2)”. In this scenario we will call
> > hashCode
> > >> > directly on the key's value. In addition, we will be able to get an
> > >> > affinity key since we know key’s type class descriptor
> > >> > (BinaryClassDescriptor).
> > >> > If a primary key is still a custom key and we want to insert its
> value
> > >> > from an SQL console, PHP, Tableu, etc. then we can’t pass the key’s
> > value
> > >> > as is. Here we’re trying to apply a workaround by listing key's
> > fields in
> > >> > INSERT/MERGE and the task is to properly re-construct the key on our
> > side
> > >> > using only specific fields.
> > >> >
> > >> > Is my understanding correct for all the bullets above?
> > >> >
> > >> > If so then, yes, I would agree that we need to list these fields in
> a
> > >> > configuration and the default hash code resolver will use them as
> > well.
> > >> > Moreover, we have to pin point an affinity field. So, the question
> is
> > >> what
> > >> > the configuration we should use.
> > >> >
> > >> > Community, any other thoughts/ideas?
> > >> >
> > >> > —
> > >> > Denis
> > >> >
> > >> > > On Sep 28, 2016, at 4:16 PM, Alexander Paschenko <
> > >> > alexander.a.paschenko@gmail.com> wrote:
> > >> > >
> > >> > > Also MERGE.
> > >> > >
> > >> > > 2016-09-29 2:10 GMT+03:00 Denis Magda <dm...@gridgain.com>:
> > >> > >> You need a hash code only for INSERT operation, right?
> > >> > >>
> > >> > >> —
> > >> > >> Denis
> > >> > >>
> > >> > >>> On Sep 28, 2016, at 3:47 PM, Alexander Paschenko <
> > >> > alexander.a.paschenko@gmail.com> wrote:
> > >> > >>>
> > >> > >>> But what if the user works from some kind of console and just
> > types
> > >> > >>> the queries as text in full and does not bind params via JDBC or
> > >> > >>> something alike? What if there's no binary object? I don't see
> > why we
> > >> > >>> should keep the user from usual cache gets in this case. I
> really
> > >> like
> > >> > >>> the idea of supplying the values of distinct fields, thus
> freeing
> > the
> > >> > >>> user of the need to mess with objects and builders, AND then
> just
> > >> > >>> calculating hash code as suggested before - say, via explicitly
> > >> > >>> listing participating fields in XML or by marking them with
> > transient
> > >> > >>> keyword or some annotation.
> > >> > >>> Actually, I believe that's the only case when we need to
> generate
> > any
> > >> > >>> hash codes - when the class is present, we can just get hash
> code
> > >> from
> > >> > >>> its implementation of its method. When there's no class, we
> > generate.
> > >> > >>> And all that is solely for SQL. For the rest - just throw an
> > >> exception
> > >> > >>> when there's no hash code manually set for binary object. I
> don't
> > see
> > >> > >>> why we should try to generate anything when the user already is
> > using
> > >> > >>> Ignite in full, not just via limited interface of SQL.
> > >> > >>>
> > >> > >>> 2016-09-29 0:31 GMT+03:00 Denis Magda <dm...@gridgain.com>:
> > >> > >>>> Hmm, this is a good question.
> > >> > >>>>
> > >> > >>>> If a user doesn’t provide a _key when an INSERT is executed for
> > me
> > >> it
> > >> > means that he is not going to use the key later for cache.get/put,
> > >> DELETE,
> > >> > UPDATE and other possible operation simply because he doesn’t know
> > how to
> > >> > reconstruct the key back in his code. If he wants to use the primary
> > key
> > >> in
> > >> > the rest of operations then he must provide it at INSERT time.
> > >> > >>>>
> > >> > >>>> Do we need this key only for a case when an object is being
> > inserted
> > >> > into a cache? If it’s so I would auto-generate a key using ‘long’
> as a
> > >> key
> > >> > type. I do remember that we provided the auto-generation for Spark
> > module
> > >> > in a some way that may be useful here.
> > >> > >>>>
> > >> > >>>> —
> > >> > >>>> Denis
> > >> > >>>>
> > >> > >>>>> On Sep 28, 2016, at 9:53 AM, Alexander Paschenko <
> > >> > alexander.a.paschenko@gmail.com> wrote:
> > >> > >>>>>
> > >> > >>>>> Denis,
> > >> > >>>>>
> > >> > >>>>> That's not what I was asking about.
> > >> > >>>>> Currently DML implementation allows for dymanic instantiation
> of
> > >> > keys,
> > >> > >>>>> in other words, user does not have to provide value for
> > >> object-typed
> > >> > >>>>> _key column - instead, he may supply just field values based
> on
> > >> which
> > >> > >>>>> _key will be dynamically instantiated/binary built. And that's
> > the
> > >> > >>>>> whole point of this discussion as I see it: what to do when
> > we've
> > >> > >>>>> binary built classless key that we build ourselves inside SQL
> > >> engine
> > >> > >>>>> and don't know how to compute hash code for it?
> > >> > >>>>>
> > >> > >>>>> - Alex
> > >> > >>>>>
> > >> > >>>>> 2016-09-28 19:48 GMT+03:00 Denis Magda <dm...@gridgain.com>:
> > >> > >>>>>> Alexander,
> > >> > >>>>>>
> > >> > >>>>>> As I guess if we have a key without a class then it will be
> > >> > constructed using a BinaryBuilder instance and it’s user
> > responsibility
> > >> to
> > >> > set the has code at the end with BinaryBuilder.hasCode method. Sure,
> > all
> > >> > this cases must be well-documented in both Java Doc API and Apache
> > Ignite
> > >> > documentation.
> > >> > >>>>>>
> > >> > >>>>>> —
> > >> > >>>>>> Denis
> > >> > >>>>>>
> > >> > >>>>>>> On Sep 28, 2016, at 9:33 AM, Alexander Paschenko <
> > >> > alexander.a.paschenko@gmail.com> wrote:
> > >> > >>>>>>>
> > >> > >>>>>>> Dmitry, Denis,
> > >> > >>>>>>>
> > >> > >>>>>>> OK, but I think it's necessary to address also the cases
> when
> > >> > there's
> > >> > >>>>>>> no actual class for the key, and its fields are simply
> > declared
> > >> in
> > >> > >>>>>>> XML. In this case, there are no fields to be marked
> transient.
> > >> > What do
> > >> > >>>>>>> we do then? List transient fields in XML separately?
> > >> > >>>>>>>
> > >> > >>>>>>> - Alex
> > >> > >>>>>>>
> > >> > >>>>>>> 2016-09-28 4:15 GMT+03:00 Dmitriy Setrakyan <
> > >> dsetrakyan@apache.org
> > >> > >:
> > >> > >>>>>>>> Agree with Denis.
> > >> > >>>>>>>>
> > >> > >>>>>>>> - by default, all non-transient key fields should
> > participate in
> > >> > the
> > >> > >>>>>>>> hashcode generation
> > >> > >>>>>>>> - when working on DDL, then the primary key fields should
> > >> > participate in
> > >> > >>>>>>>> the hashcode
> > >> > >>>>>>>> - we should add a resolver to override the default behavior
> > >> > (please
> > >> > >>>>>>>> propose the interface in Jira)
> > >> > >>>>>>>> - we should print out a warning, only once per type, the
> the
> > >> > hashcode
> > >> > >>>>>>>> has been automatically generated based on which fields and
> > which
> > >> > formula
> > >> > >>>>>>>>
> > >> > >>>>>>>> D.
> > >> > >>>>>>>>
> > >> > >>>>>>>> On Tue, Sep 27, 2016 at 5:42 PM, Denis Magda <
> > >> dmagda@gridgain.com>
> > >> > wrote:
> > >> > >>>>>>>>
> > >> > >>>>>>>>> Hi Alexander,
> > >> > >>>>>>>>>
> > >> > >>>>>>>>> Vladimir’s proposal sounds reasonable to me. However we
> must
> > >> > keep in mind
> > >> > >>>>>>>>> one important thing. Binary objects were designed to
> address
> > >> the
> > >> > following
> > >> > >>>>>>>>> disadvantages a regular serializer, like optimized
> > marshaller,
> > >> > has:
> > >> > >>>>>>>>> necessity to deserialize an object on a server side every
> > time
> > >> > it’s needed.
> > >> > >>>>>>>>> necessity to hold an object in both serialized and
> > deserialized
> > >> > forms on
> > >> > >>>>>>>>> the server node.
> > >> > >>>>>>>>> necessity to restart the whole cluster each time an object
> > >> > version is
> > >> > >>>>>>>>> changed (new field is added or an old one is removed).
> > >> > >>>>>>>>> If it will be needed to perform step 3 for a default
> > >> > implementation of the
> > >> > >>>>>>>>> binary resolver just because the resolver has to consider
> > new
> > >> > fields or
> > >> > >>>>>>>>> ignore old ones then such an implementation sucks.
> Overall,
> > the
> > >> > default
> > >> > >>>>>>>>> implementation should use the reflection coming over all
> the
> > >> > fields a key
> > >> > >>>>>>>>> has ignoring the ones that are marked with “transient”
> > keyword.
> > >> > If a user
> > >> > >>>>>>>>> wants to control the default resolver's logic then he can
> > label
> > >> > all the
> > >> > >>>>>>>>> fields that mustn’t be of a final has code value with
> > >> > “transient”. This has
> > >> > >>>>>>>>> to be well-documented for sure.
> > >> > >>>>>>>>>
> > >> > >>>>>>>>> Makes sense?
> > >> > >>>>>>>>>
> > >> > >>>>>>>>> —
> > >> > >>>>>>>>> Denis
> > >> > >>>>>>>>>
> > >> > >>>>>>>>>> On Sep 26, 2016, at 12:40 PM, Alexander Paschenko <
> > >> > >>>>>>>>> alexander.a.paschenko@gmail.com> wrote:
> > >> > >>>>>>>>>>
> > >> > >>>>>>>>>> Hello Igniters,
> > >> > >>>>>>>>>>
> > >> > >>>>>>>>>> As DML support is near, it's critical that we agree on
> how
> > we
> > >> > generate
> > >> > >>>>>>>>>> hash codes for new keys in presence of binary marshaller.
> > >> > Actually,
> > >> > >>>>>>>>>> this discussion isn't new - please see its beginning
> here:
> > >> > >>>>>>>>>>
> > >> > >>>>>>>>>> http://apache-ignite-developers.2346864.n4.nabble.
> > >> > >>>>>>>>> com/All-BinaryObjects-created-
> > by-BinaryObjectBuilder-stored-
> > >> > >>>>>>>>> at-the-same-partition-by-default-td8042.html
> > >> > >>>>>>>>>>
> > >> > >>>>>>>>>> Still, I'm creating this new thread to make getting to
> the
> > >> final
> > >> > >>>>>>>>>> solution as simple and fast as possible.
> > >> > >>>>>>>>>>
> > >> > >>>>>>>>>> I remind everyone that the approach that has got the
> least
> > >> > critics was
> > >> > >>>>>>>>>> the one proposed by Vladimir Ozerov:
> > >> > >>>>>>>>>>
> > >> > >>>>>>>>>> <quote>
> > >> > >>>>>>>>>> I think we can do the following:
> > >> > >>>>>>>>>> 1) Add "has hash code" flag as Denis suggested.
> > >> > >>>>>>>>>> 2) If object without a hash code is put to cache, throw
> an
> > >> > exception.
> > >> > >>>>>>>>>> 3) Add *BinaryEqualsHashCodeResolver *interface.
> > >> > >>>>>>>>>> 4) Add default implementation which will auto-generate
> hash
> > >> > code. *Print
> > >> > >>>>>>>>> a
> > >> > >>>>>>>>>> warning when auto-generation occurs*, so that user is
> aware
> > >> > that he is
> > >> > >>>>>>>>>> likely to have problems with normal GETs/PUTs.
> > >> > >>>>>>>>>> 5) Add another implementation which will use encoded
> > string to
> > >> > calculate
> > >> > >>>>>>>>> a
> > >> > >>>>>>>>>> hash code. E.g. *new BinaryEqualsHashCodeResolver("{a} *
> > 31 +
> > >> > {b}")*.
> > >> > >>>>>>>>>> Originally proposed by Yakov some time ago.
> > >> > >>>>>>>>>> </quote>
> > >> > >>>>>>>>>>
> > >> > >>>>>>>>>> After that, Sergi suggested that instead of a "formula"
> we
> > >> keep
> > >> > just a
> > >> > >>>>>>>>>> list of the "fields" that participate in hash code
> > evaluation,
> > >> > and
> > >> > >>>>>>>>>> with that list, we simply calculate hash code just like
> IDE
> > >> > does -
> > >> > >>>>>>>>>> with all its bit shifts and additions.
> > >> > >>>>>>>>>>
> > >> > >>>>>>>>>> I'm planning on settling down with this combined
> Vlad-Sergi
> > >> > approach.
> > >> > >>>>>>>>>> Any objections?
> > >> > >>>>>>>>>>
> > >> > >>>>>>>>>> And an extra question I have: Vlad, you suggest that we
> > both
> > >> > throw an
> > >> > >>>>>>>>>> exception on cache code absence and that we might
> generate
> > it
> > >> > as the
> > >> > >>>>>>>>>> last resort. Do I understand you correctly that you
> suggest
> > >> > generating
> > >> > >>>>>>>>>> random code only in context of SQL, but throw exception
> for
> > >> keys
> > >> > >>>>>>>>>> without codes on ordinary put?
> > >> > >>>>>>>>>>
> > >> > >>>>>>>>>> And yes, built-in hash codes for JDK types are supported
> as
> > >> > well as
> > >> > >>>>>>>>>> items 1-2 from Vlad's list (there's already fixed issue
> of
> > >> > IGNITE-3633
> > >> > >>>>>>>>>> for the flag and its presence check).
> > >> > >>>>>>>>>>
> > >> > >>>>>>>>>> - Alex
> > >> > >>>>>>>>>
> > >> > >>>>>>>>>
> > >> > >>>>>>
> > >> > >>>>
> > >> > >>
> > >> >
> > >> >
> > >>
> >
>

Re: Default hash code generation strategy for new binary objects

Posted by Dmitriy Setrakyan <ds...@apache.org>.

Alexander,

How do you plan to annotate fields that participate in the hashcode
calculation? Can you add all the changes you plan to make to the
configuration in the ticket and post the link here?

Also, we must make sure that hashcode fields do not change. I believe you
should have validation in the code on startup of a system or of a cache,
and throw an exception if it fails.

D.

On Thu, Sep 29, 2016 at 9:08 AM, Alexander Paschenko <
alexander.a.paschenko@gmail.com> wrote:

> Thanks everyone!
>
> Denis, yes, that's what I meant, now we're on the same page :)
> However, I'm worried about the same things Alexey is, that is, I'm not
> sure how we can handle presence of key fields that don't participate
> in 'equals' evaluation. Hence I'm all up for keeping mechanism of
> comparison for equality as it is now. And simply enhance the
> configuration as follows:
>
> 1. Introduce new BinaryObjectHashCodeResolver interface with single
> method having following signature:
>
> int computeHashCode(BinaryObject binObj)
>
> Cache configuration will then get a new corresponding param - probably
> a class/class name to instantiate resolver from.
>
> 2. Introduce default resolver by enhancing SQL engine configuration (=
> QueryEntity) - namely, add to it the list of hash code fields and
> optional affinity key field.
>
>
>
> 2016-09-29 18:39 GMT+03:00 Alexey Goncharuk <al...@gmail.com>:
> > Folks, let me point out a few obvious (or not) things
> >
> > A set of fields participating in hashCode and equals is impossible to
> > change without cluster restart. Imagine a new client adding a field F to
> > key structure (A, B) so that new key is (A, B, F). In this case key (1,
> 2,
> > 0) will be treated as a different key w/respect to (1, 2, 3), while the
> new
> > client expects them to be the same. I think the set of fields defining
> the
> > key _must_ be used in both hash code and equals calculation and cannot
> > change over time. Having said that, we can use binary array comparison as
> > object equality test. Presence of new 'garbage' fields in key looks
> useless
> > to me.
> >
> > So, we should support:
> > 1) All the things Dmitriy pointed out, in other words - multi-field key
> > with affinity key fields
> > 2) Ability to instantiate the key using a text-only environment
> > 3) Ability to define the structure of a key using XML or DDL
> >
> > I think all of these can be done via combination of HashCodeResolver
> > interface + default resolver which can read type configuration.
> >
> > 2016-09-29 3:54 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:
> >
> >> Guys,
> >>
> >> We need to look at 3 cases:
> >>
> >> a) key is just one field
> >> b) key is multiple fields
> >> c) key is one or multiple fields, with possibility of an alternate
> affinity
> >> key
> >>
> >> For (a) and (b), whenever a type is defined in XML, and further in DML,
> a
> >> user will specify which fields are part of the key. In that case, we
> should
> >> just grab those fields and calculate the hashcode automatically.
> >>
> >> For (c), a user should specify in XML, and further in DML, which fields
> >> should be used for affinity, in addition to the key fields. In that
> case,
> >> we again should grab those fields and calculate the hashcodes for the
> >> primary key and the affinity key.
> >>
> >> I really am not sure if there are other ways of doing it. Am I missing
> >> something?
> >>
> >> D.
> >>
> >>
> >> On Wed, Sep 28, 2016 at 5:14 PM, Denis Magda <dm...@gridgain.com>
> wrote:
> >>
> >> > Let me show the picture I have in my mind:
> >> >
> >> > Primary key is a must for all INSERT and MERGE operations. If it’s not
> >> set
> >> > then an INSERT/MERGE fails.
> >> > If a primary key is a boxed/unboxed primitive (int, Integer, String,
> >> Date,
> >> > etc.) then the key value is used for hashCode calculation. At the same
> >> time
> >> > the key will be an affinity key.
> >> > If a primary key is a custom object then it’s value can be passed as a
> >> > param directly from Java, .Net, C++ in a way like “INSERT (_key,
> field1,
> >> > field2) VALUES (?, val1, val2)”. In this scenario we will call
> hashCode
> >> > directly on the key's value. In addition, we will be able to get an
> >> > affinity key since we know key’s type class descriptor
> >> > (BinaryClassDescriptor).
> >> > If a primary key is still a custom key and we want to insert its value
> >> > from an SQL console, PHP, Tableu, etc. then we can’t pass the key’s
> value
> >> > as is. Here we’re trying to apply a workaround by listing key's
> fields in
> >> > INSERT/MERGE and the task is to properly re-construct the key on our
> side
> >> > using only specific fields.
> >> >
> >> > Is my understanding correct for all the bullets above?
> >> >
> >> > If so then, yes, I would agree that we need to list these fields in a
> >> > configuration and the default hash code resolver will use them as
> well.
> >> > Moreover, we have to pin point an affinity field. So, the question is
> >> what
> >> > the configuration we should use.
> >> >
> >> > Community, any other thoughts/ideas?
> >> >
> >> > —
> >> > Denis
> >> >
> >> > > On Sep 28, 2016, at 4:16 PM, Alexander Paschenko <
> >> > alexander.a.paschenko@gmail.com> wrote:
> >> > >
> >> > > Also MERGE.
> >> > >
> >> > > 2016-09-29 2:10 GMT+03:00 Denis Magda <dm...@gridgain.com>:
> >> > >> You need a hash code only for INSERT operation, right?
> >> > >>
> >> > >> —
> >> > >> Denis
> >> > >>
> >> > >>> On Sep 28, 2016, at 3:47 PM, Alexander Paschenko <
> >> > alexander.a.paschenko@gmail.com> wrote:
> >> > >>>
> >> > >>> But what if the user works from some kind of console and just
> types
> >> > >>> the queries as text in full and does not bind params via JDBC or
> >> > >>> something alike? What if there's no binary object? I don't see
> why we
> >> > >>> should keep the user from usual cache gets in this case. I really
> >> like
> >> > >>> the idea of supplying the values of distinct fields, thus freeing
> the
> >> > >>> user of the need to mess with objects and builders, AND then just
> >> > >>> calculating hash code as suggested before - say, via explicitly
> >> > >>> listing participating fields in XML or by marking them with
> transient
> >> > >>> keyword or some annotation.
> >> > >>> Actually, I believe that's the only case when we need to generate
> any
> >> > >>> hash codes - when the class is present, we can just get hash code
> >> from
> >> > >>> its implementation of its method. When there's no class, we
> generate.
> >> > >>> And all that is solely for SQL. For the rest - just throw an
> >> exception
> >> > >>> when there's no hash code manually set for binary object. I don't
> see
> >> > >>> why we should try to generate anything when the user already is
> using
> >> > >>> Ignite in full, not just via limited interface of SQL.
> >> > >>>
> >> > >>> 2016-09-29 0:31 GMT+03:00 Denis Magda <dm...@gridgain.com>:
> >> > >>>> Hmm, this is a good question.
> >> > >>>>
> >> > >>>> If a user doesn’t provide a _key when an INSERT is executed for
> me
> >> it
> >> > means that he is not going to use the key later for cache.get/put,
> >> DELETE,
> >> > UPDATE and other possible operation simply because he doesn’t know
> how to
> >> > reconstruct the key back in his code. If he wants to use the primary
> key
> >> in
> >> > the rest of operations then he must provide it at INSERT time.
> >> > >>>>
> >> > >>>> Do we need this key only for a case when an object is being
> inserted
> >> > into a cache? If it’s so I would auto-generate a key using ‘long’ as a
> >> key
> >> > type. I do remember that we provided the auto-generation for Spark
> module
> >> > in a some way that may be useful here.
> >> > >>>>
> >> > >>>> —
> >> > >>>> Denis
> >> > >>>>
> >> > >>>>> On Sep 28, 2016, at 9:53 AM, Alexander Paschenko <
> >> > alexander.a.paschenko@gmail.com> wrote:
> >> > >>>>>
> >> > >>>>> Denis,
> >> > >>>>>
> >> > >>>>> That's not what I was asking about.
> >> > >>>>> Currently DML implementation allows for dymanic instantiation of
> >> > keys,
> >> > >>>>> in other words, user does not have to provide value for
> >> object-typed
> >> > >>>>> _key column - instead, he may supply just field values based on
> >> which
> >> > >>>>> _key will be dynamically instantiated/binary built. And that's
> the
> >> > >>>>> whole point of this discussion as I see it: what to do when
> we've
> >> > >>>>> binary built classless key that we build ourselves inside SQL
> >> engine
> >> > >>>>> and don't know how to compute hash code for it?
> >> > >>>>>
> >> > >>>>> - Alex
> >> > >>>>>
> >> > >>>>> 2016-09-28 19:48 GMT+03:00 Denis Magda <dm...@gridgain.com>:
> >> > >>>>>> Alexander,
> >> > >>>>>>
> >> > >>>>>> As I guess if we have a key without a class then it will be
> >> > constructed using a BinaryBuilder instance and it’s user
> responsibility
> >> to
> >> > set the has code at the end with BinaryBuilder.hasCode method. Sure,
> all
> >> > this cases must be well-documented in both Java Doc API and Apache
> Ignite
> >> > documentation.
> >> > >>>>>>
> >> > >>>>>> —
> >> > >>>>>> Denis
> >> > >>>>>>
> >> > >>>>>>> On Sep 28, 2016, at 9:33 AM, Alexander Paschenko <
> >> > alexander.a.paschenko@gmail.com> wrote:
> >> > >>>>>>>
> >> > >>>>>>> Dmitry, Denis,
> >> > >>>>>>>
> >> > >>>>>>> OK, but I think it's necessary to address also the cases when
> >> > there's
> >> > >>>>>>> no actual class for the key, and its fields are simply
> declared
> >> in
> >> > >>>>>>> XML. In this case, there are no fields to be marked transient.
> >> > What do
> >> > >>>>>>> we do then? List transient fields in XML separately?
> >> > >>>>>>>
> >> > >>>>>>> - Alex
> >> > >>>>>>>
> >> > >>>>>>> 2016-09-28 4:15 GMT+03:00 Dmitriy Setrakyan <
> >> dsetrakyan@apache.org
> >> > >:
> >> > >>>>>>>> Agree with Denis.
> >> > >>>>>>>>
> >> > >>>>>>>> - by default, all non-transient key fields should
> participate in
> >> > the
> >> > >>>>>>>> hashcode generation
> >> > >>>>>>>> - when working on DDL, then the primary key fields should
> >> > participate in
> >> > >>>>>>>> the hashcode
> >> > >>>>>>>> - we should add a resolver to override the default behavior
> >> > (please
> >> > >>>>>>>> propose the interface in Jira)
> >> > >>>>>>>> - we should print out a warning, only once per type, the the
> >> > hashcode
> >> > >>>>>>>> has been automatically generated based on which fields and
> which
> >> > formula
> >> > >>>>>>>>
> >> > >>>>>>>> D.
> >> > >>>>>>>>
> >> > >>>>>>>> On Tue, Sep 27, 2016 at 5:42 PM, Denis Magda <
> >> dmagda@gridgain.com>
> >> > wrote:
> >> > >>>>>>>>
> >> > >>>>>>>>> Hi Alexander,
> >> > >>>>>>>>>
> >> > >>>>>>>>> Vladimir’s proposal sounds reasonable to me. However we must
> >> > keep in mind
> >> > >>>>>>>>> one important thing. Binary objects were designed to address
> >> the
> >> > following
> >> > >>>>>>>>> disadvantages a regular serializer, like optimized
> marshaller,
> >> > has:
> >> > >>>>>>>>> necessity to deserialize an object on a server side every
> time
> >> > it’s needed.
> >> > >>>>>>>>> necessity to hold an object in both serialized and
> deserialized
> >> > forms on
> >> > >>>>>>>>> the server node.
> >> > >>>>>>>>> necessity to restart the whole cluster each time an object
> >> > version is
> >> > >>>>>>>>> changed (new field is added or an old one is removed).
> >> > >>>>>>>>> If it will be needed to perform step 3 for a default
> >> > implementation of the
> >> > >>>>>>>>> binary resolver just because the resolver has to consider
> new
> >> > fields or
> >> > >>>>>>>>> ignore old ones then such an implementation sucks. Overall,
> the
> >> > default
> >> > >>>>>>>>> implementation should use the reflection coming over all the
> >> > fields a key
> >> > >>>>>>>>> has ignoring the ones that are marked with “transient”
> keyword.
> >> > If a user
> >> > >>>>>>>>> wants to control the default resolver's logic then he can
> label
> >> > all the
> >> > >>>>>>>>> fields that mustn’t be of a final has code value with
> >> > “transient”. This has
> >> > >>>>>>>>> to be well-documented for sure.
> >> > >>>>>>>>>
> >> > >>>>>>>>> Makes sense?
> >> > >>>>>>>>>
> >> > >>>>>>>>> —
> >> > >>>>>>>>> Denis
> >> > >>>>>>>>>
> >> > >>>>>>>>>> On Sep 26, 2016, at 12:40 PM, Alexander Paschenko <
> >> > >>>>>>>>> alexander.a.paschenko@gmail.com> wrote:
> >> > >>>>>>>>>>
> >> > >>>>>>>>>> Hello Igniters,
> >> > >>>>>>>>>>
> >> > >>>>>>>>>> As DML support is near, it's critical that we agree on how
> we
> >> > generate
> >> > >>>>>>>>>> hash codes for new keys in presence of binary marshaller.
> >> > Actually,
> >> > >>>>>>>>>> this discussion isn't new - please see its beginning here:
> >> > >>>>>>>>>>
> >> > >>>>>>>>>> http://apache-ignite-developers.2346864.n4.nabble.
> >> > >>>>>>>>> com/All-BinaryObjects-created-
> by-BinaryObjectBuilder-stored-
> >> > >>>>>>>>> at-the-same-partition-by-default-td8042.html
> >> > >>>>>>>>>>
> >> > >>>>>>>>>> Still, I'm creating this new thread to make getting to the
> >> final
> >> > >>>>>>>>>> solution as simple and fast as possible.
> >> > >>>>>>>>>>
> >> > >>>>>>>>>> I remind everyone that the approach that has got the least
> >> > critics was
> >> > >>>>>>>>>> the one proposed by Vladimir Ozerov:
> >> > >>>>>>>>>>
> >> > >>>>>>>>>> <quote>
> >> > >>>>>>>>>> I think we can do the following:
> >> > >>>>>>>>>> 1) Add "has hash code" flag as Denis suggested.
> >> > >>>>>>>>>> 2) If object without a hash code is put to cache, throw an
> >> > exception.
> >> > >>>>>>>>>> 3) Add *BinaryEqualsHashCodeResolver *interface.
> >> > >>>>>>>>>> 4) Add default implementation which will auto-generate hash
> >> > code. *Print
> >> > >>>>>>>>> a
> >> > >>>>>>>>>> warning when auto-generation occurs*, so that user is aware
> >> > that he is
> >> > >>>>>>>>>> likely to have problems with normal GETs/PUTs.
> >> > >>>>>>>>>> 5) Add another implementation which will use encoded
> string to
> >> > calculate
> >> > >>>>>>>>> a
> >> > >>>>>>>>>> hash code. E.g. *new BinaryEqualsHashCodeResolver("{a} *
> 31 +
> >> > {b}")*.
> >> > >>>>>>>>>> Originally proposed by Yakov some time ago.
> >> > >>>>>>>>>> </quote>
> >> > >>>>>>>>>>
> >> > >>>>>>>>>> After that, Sergi suggested that instead of a "formula" we
> >> keep
> >> > just a
> >> > >>>>>>>>>> list of the "fields" that participate in hash code
> evaluation,
> >> > and
> >> > >>>>>>>>>> with that list, we simply calculate hash code just like IDE
> >> > does -
> >> > >>>>>>>>>> with all its bit shifts and additions.
> >> > >>>>>>>>>>
> >> > >>>>>>>>>> I'm planning on settling down with this combined Vlad-Sergi
> >> > approach.
> >> > >>>>>>>>>> Any objections?
> >> > >>>>>>>>>>
> >> > >>>>>>>>>> And an extra question I have: Vlad, you suggest that we
> both
> >> > throw an
> >> > >>>>>>>>>> exception on cache code absence and that we might generate
> it
> >> > as the
> >> > >>>>>>>>>> last resort. Do I understand you correctly that you suggest
> >> > generating
> >> > >>>>>>>>>> random code only in context of SQL, but throw exception for
> >> keys
> >> > >>>>>>>>>> without codes on ordinary put?
> >> > >>>>>>>>>>
> >> > >>>>>>>>>> And yes, built-in hash codes for JDK types are supported as
> >> > well as
> >> > >>>>>>>>>> items 1-2 from Vlad's list (there's already fixed issue of
> >> > IGNITE-3633
> >> > >>>>>>>>>> for the flag and its presence check).
> >> > >>>>>>>>>>
> >> > >>>>>>>>>> - Alex
> >> > >>>>>>>>>
> >> > >>>>>>>>>
> >> > >>>>>>
> >> > >>>>
> >> > >>
> >> >
> >> >
> >>
>

Re: Default hash code generation strategy for new binary objects

Posted by Alexander Paschenko <al...@gmail.com>.

Thanks everyone!

Denis, yes, that's what I meant, now we're on the same page :)
However, I'm worried about the same things Alexey is, that is, I'm not
sure how we can handle presence of key fields that don't participate
in 'equals' evaluation. Hence I'm all up for keeping mechanism of
comparison for equality as it is now. And simply enhance the
configuration as follows:

1. Introduce new BinaryObjectHashCodeResolver interface with single
method having following signature:

int computeHashCode(BinaryObject binObj)

Cache configuration will then get a new corresponding param - probably
a class/class name to instantiate resolver from.

2. Introduce default resolver by enhancing SQL engine configuration (=
QueryEntity) - namely, add to it the list of hash code fields and
optional affinity key field.



2016-09-29 18:39 GMT+03:00 Alexey Goncharuk <al...@gmail.com>:
> Folks, let me point out a few obvious (or not) things
>
> A set of fields participating in hashCode and equals is impossible to
> change without cluster restart. Imagine a new client adding a field F to
> key structure (A, B) so that new key is (A, B, F). In this case key (1, 2,
> 0) will be treated as a different key w/respect to (1, 2, 3), while the new
> client expects them to be the same. I think the set of fields defining the
> key _must_ be used in both hash code and equals calculation and cannot
> change over time. Having said that, we can use binary array comparison as
> object equality test. Presence of new 'garbage' fields in key looks useless
> to me.
>
> So, we should support:
> 1) All the things Dmitriy pointed out, in other words - multi-field key
> with affinity key fields
> 2) Ability to instantiate the key using a text-only environment
> 3) Ability to define the structure of a key using XML or DDL
>
> I think all of these can be done via combination of HashCodeResolver
> interface + default resolver which can read type configuration.
>
> 2016-09-29 3:54 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:
>
>> Guys,
>>
>> We need to look at 3 cases:
>>
>> a) key is just one field
>> b) key is multiple fields
>> c) key is one or multiple fields, with possibility of an alternate affinity
>> key
>>
>> For (a) and (b), whenever a type is defined in XML, and further in DML, a
>> user will specify which fields are part of the key. In that case, we should
>> just grab those fields and calculate the hashcode automatically.
>>
>> For (c), a user should specify in XML, and further in DML, which fields
>> should be used for affinity, in addition to the key fields. In that case,
>> we again should grab those fields and calculate the hashcodes for the
>> primary key and the affinity key.
>>
>> I really am not sure if there are other ways of doing it. Am I missing
>> something?
>>
>> D.
>>
>>
>> On Wed, Sep 28, 2016 at 5:14 PM, Denis Magda <dm...@gridgain.com> wrote:
>>
>> > Let me show the picture I have in my mind:
>> >
>> > Primary key is a must for all INSERT and MERGE operations. If it’s not
>> set
>> > then an INSERT/MERGE fails.
>> > If a primary key is a boxed/unboxed primitive (int, Integer, String,
>> Date,
>> > etc.) then the key value is used for hashCode calculation. At the same
>> time
>> > the key will be an affinity key.
>> > If a primary key is a custom object then it’s value can be passed as a
>> > param directly from Java, .Net, C++ in a way like “INSERT (_key, field1,
>> > field2) VALUES (?, val1, val2)”. In this scenario we will call hashCode
>> > directly on the key's value. In addition, we will be able to get an
>> > affinity key since we know key’s type class descriptor
>> > (BinaryClassDescriptor).
>> > If a primary key is still a custom key and we want to insert its value
>> > from an SQL console, PHP, Tableu, etc. then we can’t pass the key’s value
>> > as is. Here we’re trying to apply a workaround by listing key's fields in
>> > INSERT/MERGE and the task is to properly re-construct the key on our side
>> > using only specific fields.
>> >
>> > Is my understanding correct for all the bullets above?
>> >
>> > If so then, yes, I would agree that we need to list these fields in a
>> > configuration and the default hash code resolver will use them as well.
>> > Moreover, we have to pin point an affinity field. So, the question is
>> what
>> > the configuration we should use.
>> >
>> > Community, any other thoughts/ideas?
>> >
>> > —
>> > Denis
>> >
>> > > On Sep 28, 2016, at 4:16 PM, Alexander Paschenko <
>> > alexander.a.paschenko@gmail.com> wrote:
>> > >
>> > > Also MERGE.
>> > >
>> > > 2016-09-29 2:10 GMT+03:00 Denis Magda <dm...@gridgain.com>:
>> > >> You need a hash code only for INSERT operation, right?
>> > >>
>> > >> —
>> > >> Denis
>> > >>
>> > >>> On Sep 28, 2016, at 3:47 PM, Alexander Paschenko <
>> > alexander.a.paschenko@gmail.com> wrote:
>> > >>>
>> > >>> But what if the user works from some kind of console and just types
>> > >>> the queries as text in full and does not bind params via JDBC or
>> > >>> something alike? What if there's no binary object? I don't see why we
>> > >>> should keep the user from usual cache gets in this case. I really
>> like
>> > >>> the idea of supplying the values of distinct fields, thus freeing the
>> > >>> user of the need to mess with objects and builders, AND then just
>> > >>> calculating hash code as suggested before - say, via explicitly
>> > >>> listing participating fields in XML or by marking them with transient
>> > >>> keyword or some annotation.
>> > >>> Actually, I believe that's the only case when we need to generate any
>> > >>> hash codes - when the class is present, we can just get hash code
>> from
>> > >>> its implementation of its method. When there's no class, we generate.
>> > >>> And all that is solely for SQL. For the rest - just throw an
>> exception
>> > >>> when there's no hash code manually set for binary object. I don't see
>> > >>> why we should try to generate anything when the user already is using
>> > >>> Ignite in full, not just via limited interface of SQL.
>> > >>>
>> > >>> 2016-09-29 0:31 GMT+03:00 Denis Magda <dm...@gridgain.com>:
>> > >>>> Hmm, this is a good question.
>> > >>>>
>> > >>>> If a user doesn’t provide a _key when an INSERT is executed for me
>> it
>> > means that he is not going to use the key later for cache.get/put,
>> DELETE,
>> > UPDATE and other possible operation simply because he doesn’t know how to
>> > reconstruct the key back in his code. If he wants to use the primary key
>> in
>> > the rest of operations then he must provide it at INSERT time.
>> > >>>>
>> > >>>> Do we need this key only for a case when an object is being inserted
>> > into a cache? If it’s so I would auto-generate a key using ‘long’ as a
>> key
>> > type. I do remember that we provided the auto-generation for Spark module
>> > in a some way that may be useful here.
>> > >>>>
>> > >>>> —
>> > >>>> Denis
>> > >>>>
>> > >>>>> On Sep 28, 2016, at 9:53 AM, Alexander Paschenko <
>> > alexander.a.paschenko@gmail.com> wrote:
>> > >>>>>
>> > >>>>> Denis,
>> > >>>>>
>> > >>>>> That's not what I was asking about.
>> > >>>>> Currently DML implementation allows for dymanic instantiation of
>> > keys,
>> > >>>>> in other words, user does not have to provide value for
>> object-typed
>> > >>>>> _key column - instead, he may supply just field values based on
>> which
>> > >>>>> _key will be dynamically instantiated/binary built. And that's the
>> > >>>>> whole point of this discussion as I see it: what to do when we've
>> > >>>>> binary built classless key that we build ourselves inside SQL
>> engine
>> > >>>>> and don't know how to compute hash code for it?
>> > >>>>>
>> > >>>>> - Alex
>> > >>>>>
>> > >>>>> 2016-09-28 19:48 GMT+03:00 Denis Magda <dm...@gridgain.com>:
>> > >>>>>> Alexander,
>> > >>>>>>
>> > >>>>>> As I guess if we have a key without a class then it will be
>> > constructed using a BinaryBuilder instance and it’s user responsibility
>> to
>> > set the has code at the end with BinaryBuilder.hasCode method. Sure, all
>> > this cases must be well-documented in both Java Doc API and Apache Ignite
>> > documentation.
>> > >>>>>>
>> > >>>>>> —
>> > >>>>>> Denis
>> > >>>>>>
>> > >>>>>>> On Sep 28, 2016, at 9:33 AM, Alexander Paschenko <
>> > alexander.a.paschenko@gmail.com> wrote:
>> > >>>>>>>
>> > >>>>>>> Dmitry, Denis,
>> > >>>>>>>
>> > >>>>>>> OK, but I think it's necessary to address also the cases when
>> > there's
>> > >>>>>>> no actual class for the key, and its fields are simply declared
>> in
>> > >>>>>>> XML. In this case, there are no fields to be marked transient.
>> > What do
>> > >>>>>>> we do then? List transient fields in XML separately?
>> > >>>>>>>
>> > >>>>>>> - Alex
>> > >>>>>>>
>> > >>>>>>> 2016-09-28 4:15 GMT+03:00 Dmitriy Setrakyan <
>> dsetrakyan@apache.org
>> > >:
>> > >>>>>>>> Agree with Denis.
>> > >>>>>>>>
>> > >>>>>>>> - by default, all non-transient key fields should participate in
>> > the
>> > >>>>>>>> hashcode generation
>> > >>>>>>>> - when working on DDL, then the primary key fields should
>> > participate in
>> > >>>>>>>> the hashcode
>> > >>>>>>>> - we should add a resolver to override the default behavior
>> > (please
>> > >>>>>>>> propose the interface in Jira)
>> > >>>>>>>> - we should print out a warning, only once per type, the the
>> > hashcode
>> > >>>>>>>> has been automatically generated based on which fields and which
>> > formula
>> > >>>>>>>>
>> > >>>>>>>> D.
>> > >>>>>>>>
>> > >>>>>>>> On Tue, Sep 27, 2016 at 5:42 PM, Denis Magda <
>> dmagda@gridgain.com>
>> > wrote:
>> > >>>>>>>>
>> > >>>>>>>>> Hi Alexander,
>> > >>>>>>>>>
>> > >>>>>>>>> Vladimir’s proposal sounds reasonable to me. However we must
>> > keep in mind
>> > >>>>>>>>> one important thing. Binary objects were designed to address
>> the
>> > following
>> > >>>>>>>>> disadvantages a regular serializer, like optimized marshaller,
>> > has:
>> > >>>>>>>>> necessity to deserialize an object on a server side every time
>> > it’s needed.
>> > >>>>>>>>> necessity to hold an object in both serialized and deserialized
>> > forms on
>> > >>>>>>>>> the server node.
>> > >>>>>>>>> necessity to restart the whole cluster each time an object
>> > version is
>> > >>>>>>>>> changed (new field is added or an old one is removed).
>> > >>>>>>>>> If it will be needed to perform step 3 for a default
>> > implementation of the
>> > >>>>>>>>> binary resolver just because the resolver has to consider new
>> > fields or
>> > >>>>>>>>> ignore old ones then such an implementation sucks. Overall, the
>> > default
>> > >>>>>>>>> implementation should use the reflection coming over all the
>> > fields a key
>> > >>>>>>>>> has ignoring the ones that are marked with “transient” keyword.
>> > If a user
>> > >>>>>>>>> wants to control the default resolver's logic then he can label
>> > all the
>> > >>>>>>>>> fields that mustn’t be of a final has code value with
>> > “transient”. This has
>> > >>>>>>>>> to be well-documented for sure.
>> > >>>>>>>>>
>> > >>>>>>>>> Makes sense?
>> > >>>>>>>>>
>> > >>>>>>>>> —
>> > >>>>>>>>> Denis
>> > >>>>>>>>>
>> > >>>>>>>>>> On Sep 26, 2016, at 12:40 PM, Alexander Paschenko <
>> > >>>>>>>>> alexander.a.paschenko@gmail.com> wrote:
>> > >>>>>>>>>>
>> > >>>>>>>>>> Hello Igniters,
>> > >>>>>>>>>>
>> > >>>>>>>>>> As DML support is near, it's critical that we agree on how we
>> > generate
>> > >>>>>>>>>> hash codes for new keys in presence of binary marshaller.
>> > Actually,
>> > >>>>>>>>>> this discussion isn't new - please see its beginning here:
>> > >>>>>>>>>>
>> > >>>>>>>>>> http://apache-ignite-developers.2346864.n4.nabble.
>> > >>>>>>>>> com/All-BinaryObjects-created-by-BinaryObjectBuilder-stored-
>> > >>>>>>>>> at-the-same-partition-by-default-td8042.html
>> > >>>>>>>>>>
>> > >>>>>>>>>> Still, I'm creating this new thread to make getting to the
>> final
>> > >>>>>>>>>> solution as simple and fast as possible.
>> > >>>>>>>>>>
>> > >>>>>>>>>> I remind everyone that the approach that has got the least
>> > critics was
>> > >>>>>>>>>> the one proposed by Vladimir Ozerov:
>> > >>>>>>>>>>
>> > >>>>>>>>>> <quote>
>> > >>>>>>>>>> I think we can do the following:
>> > >>>>>>>>>> 1) Add "has hash code" flag as Denis suggested.
>> > >>>>>>>>>> 2) If object without a hash code is put to cache, throw an
>> > exception.
>> > >>>>>>>>>> 3) Add *BinaryEqualsHashCodeResolver *interface.
>> > >>>>>>>>>> 4) Add default implementation which will auto-generate hash
>> > code. *Print
>> > >>>>>>>>> a
>> > >>>>>>>>>> warning when auto-generation occurs*, so that user is aware
>> > that he is
>> > >>>>>>>>>> likely to have problems with normal GETs/PUTs.
>> > >>>>>>>>>> 5) Add another implementation which will use encoded string to
>> > calculate
>> > >>>>>>>>> a
>> > >>>>>>>>>> hash code. E.g. *new BinaryEqualsHashCodeResolver("{a} * 31 +
>> > {b}")*.
>> > >>>>>>>>>> Originally proposed by Yakov some time ago.
>> > >>>>>>>>>> </quote>
>> > >>>>>>>>>>
>> > >>>>>>>>>> After that, Sergi suggested that instead of a "formula" we
>> keep
>> > just a
>> > >>>>>>>>>> list of the "fields" that participate in hash code evaluation,
>> > and
>> > >>>>>>>>>> with that list, we simply calculate hash code just like IDE
>> > does -
>> > >>>>>>>>>> with all its bit shifts and additions.
>> > >>>>>>>>>>
>> > >>>>>>>>>> I'm planning on settling down with this combined Vlad-Sergi
>> > approach.
>> > >>>>>>>>>> Any objections?
>> > >>>>>>>>>>
>> > >>>>>>>>>> And an extra question I have: Vlad, you suggest that we both
>> > throw an
>> > >>>>>>>>>> exception on cache code absence and that we might generate it
>> > as the
>> > >>>>>>>>>> last resort. Do I understand you correctly that you suggest
>> > generating
>> > >>>>>>>>>> random code only in context of SQL, but throw exception for
>> keys
>> > >>>>>>>>>> without codes on ordinary put?
>> > >>>>>>>>>>
>> > >>>>>>>>>> And yes, built-in hash codes for JDK types are supported as
>> > well as
>> > >>>>>>>>>> items 1-2 from Vlad's list (there's already fixed issue of
>> > IGNITE-3633
>> > >>>>>>>>>> for the flag and its presence check).
>> > >>>>>>>>>>
>> > >>>>>>>>>> - Alex
>> > >>>>>>>>>
>> > >>>>>>>>>
>> > >>>>>>
>> > >>>>
>> > >>
>> >
>> >
>>

Re: Default hash code generation strategy for new binary objects

Posted by Alexey Goncharuk <al...@gmail.com>.

Folks, let me point out a few obvious (or not) things

A set of fields participating in hashCode and equals is impossible to
change without cluster restart. Imagine a new client adding a field F to
key structure (A, B) so that new key is (A, B, F). In this case key (1, 2,
0) will be treated as a different key w/respect to (1, 2, 3), while the new
client expects them to be the same. I think the set of fields defining the
key _must_ be used in both hash code and equals calculation and cannot
change over time. Having said that, we can use binary array comparison as
object equality test. Presence of new 'garbage' fields in key looks useless
to me.

So, we should support:
1) All the things Dmitriy pointed out, in other words - multi-field key
with affinity key fields
2) Ability to instantiate the key using a text-only environment
3) Ability to define the structure of a key using XML or DDL

I think all of these can be done via combination of HashCodeResolver
interface + default resolver which can read type configuration.

2016-09-29 3:54 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:

> Guys,
>
> We need to look at 3 cases:
>
> a) key is just one field
> b) key is multiple fields
> c) key is one or multiple fields, with possibility of an alternate affinity
> key
>
> For (a) and (b), whenever a type is defined in XML, and further in DML, a
> user will specify which fields are part of the key. In that case, we should
> just grab those fields and calculate the hashcode automatically.
>
> For (c), a user should specify in XML, and further in DML, which fields
> should be used for affinity, in addition to the key fields. In that case,
> we again should grab those fields and calculate the hashcodes for the
> primary key and the affinity key.
>
> I really am not sure if there are other ways of doing it. Am I missing
> something?
>
> D.
>
>
> On Wed, Sep 28, 2016 at 5:14 PM, Denis Magda <dm...@gridgain.com> wrote:
>
> > Let me show the picture I have in my mind:
> >
> > Primary key is a must for all INSERT and MERGE operations. If it’s not
> set
> > then an INSERT/MERGE fails.
> > If a primary key is a boxed/unboxed primitive (int, Integer, String,
> Date,
> > etc.) then the key value is used for hashCode calculation. At the same
> time
> > the key will be an affinity key.
> > If a primary key is a custom object then it’s value can be passed as a
> > param directly from Java, .Net, C++ in a way like “INSERT (_key, field1,
> > field2) VALUES (?, val1, val2)”. In this scenario we will call hashCode
> > directly on the key's value. In addition, we will be able to get an
> > affinity key since we know key’s type class descriptor
> > (BinaryClassDescriptor).
> > If a primary key is still a custom key and we want to insert its value
> > from an SQL console, PHP, Tableu, etc. then we can’t pass the key’s value
> > as is. Here we’re trying to apply a workaround by listing key's fields in
> > INSERT/MERGE and the task is to properly re-construct the key on our side
> > using only specific fields.
> >
> > Is my understanding correct for all the bullets above?
> >
> > If so then, yes, I would agree that we need to list these fields in a
> > configuration and the default hash code resolver will use them as well.
> > Moreover, we have to pin point an affinity field. So, the question is
> what
> > the configuration we should use.
> >
> > Community, any other thoughts/ideas?
> >
> > —
> > Denis
> >
> > > On Sep 28, 2016, at 4:16 PM, Alexander Paschenko <
> > alexander.a.paschenko@gmail.com> wrote:
> > >
> > > Also MERGE.
> > >
> > > 2016-09-29 2:10 GMT+03:00 Denis Magda <dm...@gridgain.com>:
> > >> You need a hash code only for INSERT operation, right?
> > >>
> > >> —
> > >> Denis
> > >>
> > >>> On Sep 28, 2016, at 3:47 PM, Alexander Paschenko <
> > alexander.a.paschenko@gmail.com> wrote:
> > >>>
> > >>> But what if the user works from some kind of console and just types
> > >>> the queries as text in full and does not bind params via JDBC or
> > >>> something alike? What if there's no binary object? I don't see why we
> > >>> should keep the user from usual cache gets in this case. I really
> like
> > >>> the idea of supplying the values of distinct fields, thus freeing the
> > >>> user of the need to mess with objects and builders, AND then just
> > >>> calculating hash code as suggested before - say, via explicitly
> > >>> listing participating fields in XML or by marking them with transient
> > >>> keyword or some annotation.
> > >>> Actually, I believe that's the only case when we need to generate any
> > >>> hash codes - when the class is present, we can just get hash code
> from
> > >>> its implementation of its method. When there's no class, we generate.
> > >>> And all that is solely for SQL. For the rest - just throw an
> exception
> > >>> when there's no hash code manually set for binary object. I don't see
> > >>> why we should try to generate anything when the user already is using
> > >>> Ignite in full, not just via limited interface of SQL.
> > >>>
> > >>> 2016-09-29 0:31 GMT+03:00 Denis Magda <dm...@gridgain.com>:
> > >>>> Hmm, this is a good question.
> > >>>>
> > >>>> If a user doesn’t provide a _key when an INSERT is executed for me
> it
> > means that he is not going to use the key later for cache.get/put,
> DELETE,
> > UPDATE and other possible operation simply because he doesn’t know how to
> > reconstruct the key back in his code. If he wants to use the primary key
> in
> > the rest of operations then he must provide it at INSERT time.
> > >>>>
> > >>>> Do we need this key only for a case when an object is being inserted
> > into a cache? If it’s so I would auto-generate a key using ‘long’ as a
> key
> > type. I do remember that we provided the auto-generation for Spark module
> > in a some way that may be useful here.
> > >>>>
> > >>>> —
> > >>>> Denis
> > >>>>
> > >>>>> On Sep 28, 2016, at 9:53 AM, Alexander Paschenko <
> > alexander.a.paschenko@gmail.com> wrote:
> > >>>>>
> > >>>>> Denis,
> > >>>>>
> > >>>>> That's not what I was asking about.
> > >>>>> Currently DML implementation allows for dymanic instantiation of
> > keys,
> > >>>>> in other words, user does not have to provide value for
> object-typed
> > >>>>> _key column - instead, he may supply just field values based on
> which
> > >>>>> _key will be dynamically instantiated/binary built. And that's the
> > >>>>> whole point of this discussion as I see it: what to do when we've
> > >>>>> binary built classless key that we build ourselves inside SQL
> engine
> > >>>>> and don't know how to compute hash code for it?
> > >>>>>
> > >>>>> - Alex
> > >>>>>
> > >>>>> 2016-09-28 19:48 GMT+03:00 Denis Magda <dm...@gridgain.com>:
> > >>>>>> Alexander,
> > >>>>>>
> > >>>>>> As I guess if we have a key without a class then it will be
> > constructed using a BinaryBuilder instance and it’s user responsibility
> to
> > set the has code at the end with BinaryBuilder.hasCode method. Sure, all
> > this cases must be well-documented in both Java Doc API and Apache Ignite
> > documentation.
> > >>>>>>
> > >>>>>> —
> > >>>>>> Denis
> > >>>>>>
> > >>>>>>> On Sep 28, 2016, at 9:33 AM, Alexander Paschenko <
> > alexander.a.paschenko@gmail.com> wrote:
> > >>>>>>>
> > >>>>>>> Dmitry, Denis,
> > >>>>>>>
> > >>>>>>> OK, but I think it's necessary to address also the cases when
> > there's
> > >>>>>>> no actual class for the key, and its fields are simply declared
> in
> > >>>>>>> XML. In this case, there are no fields to be marked transient.
> > What do
> > >>>>>>> we do then? List transient fields in XML separately?
> > >>>>>>>
> > >>>>>>> - Alex
> > >>>>>>>
> > >>>>>>> 2016-09-28 4:15 GMT+03:00 Dmitriy Setrakyan <
> dsetrakyan@apache.org
> > >:
> > >>>>>>>> Agree with Denis.
> > >>>>>>>>
> > >>>>>>>> - by default, all non-transient key fields should participate in
> > the
> > >>>>>>>> hashcode generation
> > >>>>>>>> - when working on DDL, then the primary key fields should
> > participate in
> > >>>>>>>> the hashcode
> > >>>>>>>> - we should add a resolver to override the default behavior
> > (please
> > >>>>>>>> propose the interface in Jira)
> > >>>>>>>> - we should print out a warning, only once per type, the the
> > hashcode
> > >>>>>>>> has been automatically generated based on which fields and which
> > formula
> > >>>>>>>>
> > >>>>>>>> D.
> > >>>>>>>>
> > >>>>>>>> On Tue, Sep 27, 2016 at 5:42 PM, Denis Magda <
> dmagda@gridgain.com>
> > wrote:
> > >>>>>>>>
> > >>>>>>>>> Hi Alexander,
> > >>>>>>>>>
> > >>>>>>>>> Vladimir’s proposal sounds reasonable to me. However we must
> > keep in mind
> > >>>>>>>>> one important thing. Binary objects were designed to address
> the
> > following
> > >>>>>>>>> disadvantages a regular serializer, like optimized marshaller,
> > has:
> > >>>>>>>>> necessity to deserialize an object on a server side every time
> > it’s needed.
> > >>>>>>>>> necessity to hold an object in both serialized and deserialized
> > forms on
> > >>>>>>>>> the server node.
> > >>>>>>>>> necessity to restart the whole cluster each time an object
> > version is
> > >>>>>>>>> changed (new field is added or an old one is removed).
> > >>>>>>>>> If it will be needed to perform step 3 for a default
> > implementation of the
> > >>>>>>>>> binary resolver just because the resolver has to consider new
> > fields or
> > >>>>>>>>> ignore old ones then such an implementation sucks. Overall, the
> > default
> > >>>>>>>>> implementation should use the reflection coming over all the
> > fields a key
> > >>>>>>>>> has ignoring the ones that are marked with “transient” keyword.
> > If a user
> > >>>>>>>>> wants to control the default resolver's logic then he can label
> > all the
> > >>>>>>>>> fields that mustn’t be of a final has code value with
> > “transient”. This has
> > >>>>>>>>> to be well-documented for sure.
> > >>>>>>>>>
> > >>>>>>>>> Makes sense?
> > >>>>>>>>>
> > >>>>>>>>> —
> > >>>>>>>>> Denis
> > >>>>>>>>>
> > >>>>>>>>>> On Sep 26, 2016, at 12:40 PM, Alexander Paschenko <
> > >>>>>>>>> alexander.a.paschenko@gmail.com> wrote:
> > >>>>>>>>>>
> > >>>>>>>>>> Hello Igniters,
> > >>>>>>>>>>
> > >>>>>>>>>> As DML support is near, it's critical that we agree on how we
> > generate
> > >>>>>>>>>> hash codes for new keys in presence of binary marshaller.
> > Actually,
> > >>>>>>>>>> this discussion isn't new - please see its beginning here:
> > >>>>>>>>>>
> > >>>>>>>>>> http://apache-ignite-developers.2346864.n4.nabble.
> > >>>>>>>>> com/All-BinaryObjects-created-by-BinaryObjectBuilder-stored-
> > >>>>>>>>> at-the-same-partition-by-default-td8042.html
> > >>>>>>>>>>
> > >>>>>>>>>> Still, I'm creating this new thread to make getting to the
> final
> > >>>>>>>>>> solution as simple and fast as possible.
> > >>>>>>>>>>
> > >>>>>>>>>> I remind everyone that the approach that has got the least
> > critics was
> > >>>>>>>>>> the one proposed by Vladimir Ozerov:
> > >>>>>>>>>>
> > >>>>>>>>>> <quote>
> > >>>>>>>>>> I think we can do the following:
> > >>>>>>>>>> 1) Add "has hash code" flag as Denis suggested.
> > >>>>>>>>>> 2) If object without a hash code is put to cache, throw an
> > exception.
> > >>>>>>>>>> 3) Add *BinaryEqualsHashCodeResolver *interface.
> > >>>>>>>>>> 4) Add default implementation which will auto-generate hash
> > code. *Print
> > >>>>>>>>> a
> > >>>>>>>>>> warning when auto-generation occurs*, so that user is aware
> > that he is
> > >>>>>>>>>> likely to have problems with normal GETs/PUTs.
> > >>>>>>>>>> 5) Add another implementation which will use encoded string to
> > calculate
> > >>>>>>>>> a
> > >>>>>>>>>> hash code. E.g. *new BinaryEqualsHashCodeResolver("{a} * 31 +
> > {b}")*.
> > >>>>>>>>>> Originally proposed by Yakov some time ago.
> > >>>>>>>>>> </quote>
> > >>>>>>>>>>
> > >>>>>>>>>> After that, Sergi suggested that instead of a "formula" we
> keep
> > just a
> > >>>>>>>>>> list of the "fields" that participate in hash code evaluation,
> > and
> > >>>>>>>>>> with that list, we simply calculate hash code just like IDE
> > does -
> > >>>>>>>>>> with all its bit shifts and additions.
> > >>>>>>>>>>
> > >>>>>>>>>> I'm planning on settling down with this combined Vlad-Sergi
> > approach.
> > >>>>>>>>>> Any objections?
> > >>>>>>>>>>
> > >>>>>>>>>> And an extra question I have: Vlad, you suggest that we both
> > throw an
> > >>>>>>>>>> exception on cache code absence and that we might generate it
> > as the
> > >>>>>>>>>> last resort. Do I understand you correctly that you suggest
> > generating
> > >>>>>>>>>> random code only in context of SQL, but throw exception for
> keys
> > >>>>>>>>>> without codes on ordinary put?
> > >>>>>>>>>>
> > >>>>>>>>>> And yes, built-in hash codes for JDK types are supported as
> > well as
> > >>>>>>>>>> items 1-2 from Vlad's list (there's already fixed issue of
> > IGNITE-3633
> > >>>>>>>>>> for the flag and its presence check).
> > >>>>>>>>>>
> > >>>>>>>>>> - Alex
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>
> > >>>>
> > >>
> >
> >
>

Re: Default hash code generation strategy for new binary objects

Posted by Dmitriy Setrakyan <ds...@apache.org>.

Guys,

We need to look at 3 cases:

a) key is just one field
b) key is multiple fields
c) key is one or multiple fields, with possibility of an alternate affinity
key

For (a) and (b), whenever a type is defined in XML, and further in DML, a
user will specify which fields are part of the key. In that case, we should
just grab those fields and calculate the hashcode automatically.

For (c), a user should specify in XML, and further in DML, which fields
should be used for affinity, in addition to the key fields. In that case,
we again should grab those fields and calculate the hashcodes for the
primary key and the affinity key.

I really am not sure if there are other ways of doing it. Am I missing
something?

D.


On Wed, Sep 28, 2016 at 5:14 PM, Denis Magda <dm...@gridgain.com> wrote:

> Let me show the picture I have in my mind:
>
> Primary key is a must for all INSERT and MERGE operations. If it’s not set
> then an INSERT/MERGE fails.
> If a primary key is a boxed/unboxed primitive (int, Integer, String, Date,
> etc.) then the key value is used for hashCode calculation. At the same time
> the key will be an affinity key.
> If a primary key is a custom object then it’s value can be passed as a
> param directly from Java, .Net, C++ in a way like “INSERT (_key, field1,
> field2) VALUES (?, val1, val2)”. In this scenario we will call hashCode
> directly on the key's value. In addition, we will be able to get an
> affinity key since we know key’s type class descriptor
> (BinaryClassDescriptor).
> If a primary key is still a custom key and we want to insert its value
> from an SQL console, PHP, Tableu, etc. then we can’t pass the key’s value
> as is. Here we’re trying to apply a workaround by listing key's fields in
> INSERT/MERGE and the task is to properly re-construct the key on our side
> using only specific fields.
>
> Is my understanding correct for all the bullets above?
>
> If so then, yes, I would agree that we need to list these fields in a
> configuration and the default hash code resolver will use them as well.
> Moreover, we have to pin point an affinity field. So, the question is what
> the configuration we should use.
>
> Community, any other thoughts/ideas?
>
> —
> Denis
>
> > On Sep 28, 2016, at 4:16 PM, Alexander Paschenko <
> alexander.a.paschenko@gmail.com> wrote:
> >
> > Also MERGE.
> >
> > 2016-09-29 2:10 GMT+03:00 Denis Magda <dm...@gridgain.com>:
> >> You need a hash code only for INSERT operation, right?
> >>
> >> —
> >> Denis
> >>
> >>> On Sep 28, 2016, at 3:47 PM, Alexander Paschenko <
> alexander.a.paschenko@gmail.com> wrote:
> >>>
> >>> But what if the user works from some kind of console and just types
> >>> the queries as text in full and does not bind params via JDBC or
> >>> something alike? What if there's no binary object? I don't see why we
> >>> should keep the user from usual cache gets in this case. I really like
> >>> the idea of supplying the values of distinct fields, thus freeing the
> >>> user of the need to mess with objects and builders, AND then just
> >>> calculating hash code as suggested before - say, via explicitly
> >>> listing participating fields in XML or by marking them with transient
> >>> keyword or some annotation.
> >>> Actually, I believe that's the only case when we need to generate any
> >>> hash codes - when the class is present, we can just get hash code from
> >>> its implementation of its method. When there's no class, we generate.
> >>> And all that is solely for SQL. For the rest - just throw an exception
> >>> when there's no hash code manually set for binary object. I don't see
> >>> why we should try to generate anything when the user already is using
> >>> Ignite in full, not just via limited interface of SQL.
> >>>
> >>> 2016-09-29 0:31 GMT+03:00 Denis Magda <dm...@gridgain.com>:
> >>>> Hmm, this is a good question.
> >>>>
> >>>> If a user doesn’t provide a _key when an INSERT is executed for me it
> means that he is not going to use the key later for cache.get/put, DELETE,
> UPDATE and other possible operation simply because he doesn’t know how to
> reconstruct the key back in his code. If he wants to use the primary key in
> the rest of operations then he must provide it at INSERT time.
> >>>>
> >>>> Do we need this key only for a case when an object is being inserted
> into a cache? If it’s so I would auto-generate a key using ‘long’ as a key
> type. I do remember that we provided the auto-generation for Spark module
> in a some way that may be useful here.
> >>>>
> >>>> —
> >>>> Denis
> >>>>
> >>>>> On Sep 28, 2016, at 9:53 AM, Alexander Paschenko <
> alexander.a.paschenko@gmail.com> wrote:
> >>>>>
> >>>>> Denis,
> >>>>>
> >>>>> That's not what I was asking about.
> >>>>> Currently DML implementation allows for dymanic instantiation of
> keys,
> >>>>> in other words, user does not have to provide value for object-typed
> >>>>> _key column - instead, he may supply just field values based on which
> >>>>> _key will be dynamically instantiated/binary built. And that's the
> >>>>> whole point of this discussion as I see it: what to do when we've
> >>>>> binary built classless key that we build ourselves inside SQL engine
> >>>>> and don't know how to compute hash code for it?
> >>>>>
> >>>>> - Alex
> >>>>>
> >>>>> 2016-09-28 19:48 GMT+03:00 Denis Magda <dm...@gridgain.com>:
> >>>>>> Alexander,
> >>>>>>
> >>>>>> As I guess if we have a key without a class then it will be
> constructed using a BinaryBuilder instance and it’s user responsibility to
> set the has code at the end with BinaryBuilder.hasCode method. Sure, all
> this cases must be well-documented in both Java Doc API and Apache Ignite
> documentation.
> >>>>>>
> >>>>>> —
> >>>>>> Denis
> >>>>>>
> >>>>>>> On Sep 28, 2016, at 9:33 AM, Alexander Paschenko <
> alexander.a.paschenko@gmail.com> wrote:
> >>>>>>>
> >>>>>>> Dmitry, Denis,
> >>>>>>>
> >>>>>>> OK, but I think it's necessary to address also the cases when
> there's
> >>>>>>> no actual class for the key, and its fields are simply declared in
> >>>>>>> XML. In this case, there are no fields to be marked transient.
> What do
> >>>>>>> we do then? List transient fields in XML separately?
> >>>>>>>
> >>>>>>> - Alex
> >>>>>>>
> >>>>>>> 2016-09-28 4:15 GMT+03:00 Dmitriy Setrakyan <dsetrakyan@apache.org
> >:
> >>>>>>>> Agree with Denis.
> >>>>>>>>
> >>>>>>>> - by default, all non-transient key fields should participate in
> the
> >>>>>>>> hashcode generation
> >>>>>>>> - when working on DDL, then the primary key fields should
> participate in
> >>>>>>>> the hashcode
> >>>>>>>> - we should add a resolver to override the default behavior
> (please
> >>>>>>>> propose the interface in Jira)
> >>>>>>>> - we should print out a warning, only once per type, the the
> hashcode
> >>>>>>>> has been automatically generated based on which fields and which
> formula
> >>>>>>>>
> >>>>>>>> D.
> >>>>>>>>
> >>>>>>>> On Tue, Sep 27, 2016 at 5:42 PM, Denis Magda <dm...@gridgain.com>
> wrote:
> >>>>>>>>
> >>>>>>>>> Hi Alexander,
> >>>>>>>>>
> >>>>>>>>> Vladimir’s proposal sounds reasonable to me. However we must
> keep in mind
> >>>>>>>>> one important thing. Binary objects were designed to address the
> following
> >>>>>>>>> disadvantages a regular serializer, like optimized marshaller,
> has:
> >>>>>>>>> necessity to deserialize an object on a server side every time
> it’s needed.
> >>>>>>>>> necessity to hold an object in both serialized and deserialized
> forms on
> >>>>>>>>> the server node.
> >>>>>>>>> necessity to restart the whole cluster each time an object
> version is
> >>>>>>>>> changed (new field is added or an old one is removed).
> >>>>>>>>> If it will be needed to perform step 3 for a default
> implementation of the
> >>>>>>>>> binary resolver just because the resolver has to consider new
> fields or
> >>>>>>>>> ignore old ones then such an implementation sucks. Overall, the
> default
> >>>>>>>>> implementation should use the reflection coming over all the
> fields a key
> >>>>>>>>> has ignoring the ones that are marked with “transient” keyword.
> If a user
> >>>>>>>>> wants to control the default resolver's logic then he can label
> all the
> >>>>>>>>> fields that mustn’t be of a final has code value with
> “transient”. This has
> >>>>>>>>> to be well-documented for sure.
> >>>>>>>>>
> >>>>>>>>> Makes sense?
> >>>>>>>>>
> >>>>>>>>> —
> >>>>>>>>> Denis
> >>>>>>>>>
> >>>>>>>>>> On Sep 26, 2016, at 12:40 PM, Alexander Paschenko <
> >>>>>>>>> alexander.a.paschenko@gmail.com> wrote:
> >>>>>>>>>>
> >>>>>>>>>> Hello Igniters,
> >>>>>>>>>>
> >>>>>>>>>> As DML support is near, it's critical that we agree on how we
> generate
> >>>>>>>>>> hash codes for new keys in presence of binary marshaller.
> Actually,
> >>>>>>>>>> this discussion isn't new - please see its beginning here:
> >>>>>>>>>>
> >>>>>>>>>> http://apache-ignite-developers.2346864.n4.nabble.
> >>>>>>>>> com/All-BinaryObjects-created-by-BinaryObjectBuilder-stored-
> >>>>>>>>> at-the-same-partition-by-default-td8042.html
> >>>>>>>>>>
> >>>>>>>>>> Still, I'm creating this new thread to make getting to the final
> >>>>>>>>>> solution as simple and fast as possible.
> >>>>>>>>>>
> >>>>>>>>>> I remind everyone that the approach that has got the least
> critics was
> >>>>>>>>>> the one proposed by Vladimir Ozerov:
> >>>>>>>>>>
> >>>>>>>>>> <quote>
> >>>>>>>>>> I think we can do the following:
> >>>>>>>>>> 1) Add "has hash code" flag as Denis suggested.
> >>>>>>>>>> 2) If object without a hash code is put to cache, throw an
> exception.
> >>>>>>>>>> 3) Add *BinaryEqualsHashCodeResolver *interface.
> >>>>>>>>>> 4) Add default implementation which will auto-generate hash
> code. *Print
> >>>>>>>>> a
> >>>>>>>>>> warning when auto-generation occurs*, so that user is aware
> that he is
> >>>>>>>>>> likely to have problems with normal GETs/PUTs.
> >>>>>>>>>> 5) Add another implementation which will use encoded string to
> calculate
> >>>>>>>>> a
> >>>>>>>>>> hash code. E.g. *new BinaryEqualsHashCodeResolver("{a} * 31 +
> {b}")*.
> >>>>>>>>>> Originally proposed by Yakov some time ago.
> >>>>>>>>>> </quote>
> >>>>>>>>>>
> >>>>>>>>>> After that, Sergi suggested that instead of a "formula" we keep
> just a
> >>>>>>>>>> list of the "fields" that participate in hash code evaluation,
> and
> >>>>>>>>>> with that list, we simply calculate hash code just like IDE
> does -
> >>>>>>>>>> with all its bit shifts and additions.
> >>>>>>>>>>
> >>>>>>>>>> I'm planning on settling down with this combined Vlad-Sergi
> approach.
> >>>>>>>>>> Any objections?
> >>>>>>>>>>
> >>>>>>>>>> And an extra question I have: Vlad, you suggest that we both
> throw an
> >>>>>>>>>> exception on cache code absence and that we might generate it
> as the
> >>>>>>>>>> last resort. Do I understand you correctly that you suggest
> generating
> >>>>>>>>>> random code only in context of SQL, but throw exception for keys
> >>>>>>>>>> without codes on ordinary put?
> >>>>>>>>>>
> >>>>>>>>>> And yes, built-in hash codes for JDK types are supported as
> well as
> >>>>>>>>>> items 1-2 from Vlad's list (there's already fixed issue of
> IGNITE-3633
> >>>>>>>>>> for the flag and its presence check).
> >>>>>>>>>>
> >>>>>>>>>> - Alex
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>
> >>>>
> >>
>
>

Re: Default hash code generation strategy for new binary objects

Posted by Denis Magda <dm...@gridgain.com>.

Let me show the picture I have in my mind:

Primary key is a must for all INSERT and MERGE operations. If it’s not set then an INSERT/MERGE fails.
If a primary key is a boxed/unboxed primitive (int, Integer, String, Date, etc.) then the key value is used for hashCode calculation. At the same time the key will be an affinity key.
If a primary key is a custom object then it’s value can be passed as a param directly from Java, .Net, C++ in a way like “INSERT (_key, field1, field2) VALUES (?, val1, val2)”. In this scenario we will call hashCode directly on the key's value. In addition, we will be able to get an affinity key since we know key’s type class descriptor (BinaryClassDescriptor).
If a primary key is still a custom key and we want to insert its value from an SQL console, PHP, Tableu, etc. then we can’t pass the key’s value as is. Here we’re trying to apply a workaround by listing key's fields in INSERT/MERGE and the task is to properly re-construct the key on our side using only specific fields.  

Is my understanding correct for all the bullets above? 

If so then, yes, I would agree that we need to list these fields in a configuration and the default hash code resolver will use them as well. Moreover, we have to pin point an affinity field. So, the question is what the configuration we should use.

Community, any other thoughts/ideas?

—
Denis 

> On Sep 28, 2016, at 4:16 PM, Alexander Paschenko <al...@gmail.com> wrote:
> 
> Also MERGE.
> 
> 2016-09-29 2:10 GMT+03:00 Denis Magda <dm...@gridgain.com>:
>> You need a hash code only for INSERT operation, right?
>> 
>> —
>> Denis
>> 
>>> On Sep 28, 2016, at 3:47 PM, Alexander Paschenko <al...@gmail.com> wrote:
>>> 
>>> But what if the user works from some kind of console and just types
>>> the queries as text in full and does not bind params via JDBC or
>>> something alike? What if there's no binary object? I don't see why we
>>> should keep the user from usual cache gets in this case. I really like
>>> the idea of supplying the values of distinct fields, thus freeing the
>>> user of the need to mess with objects and builders, AND then just
>>> calculating hash code as suggested before - say, via explicitly
>>> listing participating fields in XML or by marking them with transient
>>> keyword or some annotation.
>>> Actually, I believe that's the only case when we need to generate any
>>> hash codes - when the class is present, we can just get hash code from
>>> its implementation of its method. When there's no class, we generate.
>>> And all that is solely for SQL. For the rest - just throw an exception
>>> when there's no hash code manually set for binary object. I don't see
>>> why we should try to generate anything when the user already is using
>>> Ignite in full, not just via limited interface of SQL.
>>> 
>>> 2016-09-29 0:31 GMT+03:00 Denis Magda <dm...@gridgain.com>:
>>>> Hmm, this is a good question.
>>>> 
>>>> If a user doesn’t provide a _key when an INSERT is executed for me it means that he is not going to use the key later for cache.get/put, DELETE, UPDATE and other possible operation simply because he doesn’t know how to reconstruct the key back in his code. If he wants to use the primary key in the rest of operations then he must provide it at INSERT time.
>>>> 
>>>> Do we need this key only for a case when an object is being inserted into a cache? If it’s so I would auto-generate a key using ‘long’ as a key type. I do remember that we provided the auto-generation for Spark module in a some way that may be useful here.
>>>> 
>>>> —
>>>> Denis
>>>> 
>>>>> On Sep 28, 2016, at 9:53 AM, Alexander Paschenko <al...@gmail.com> wrote:
>>>>> 
>>>>> Denis,
>>>>> 
>>>>> That's not what I was asking about.
>>>>> Currently DML implementation allows for dymanic instantiation of keys,
>>>>> in other words, user does not have to provide value for object-typed
>>>>> _key column - instead, he may supply just field values based on which
>>>>> _key will be dynamically instantiated/binary built. And that's the
>>>>> whole point of this discussion as I see it: what to do when we've
>>>>> binary built classless key that we build ourselves inside SQL engine
>>>>> and don't know how to compute hash code for it?
>>>>> 
>>>>> - Alex
>>>>> 
>>>>> 2016-09-28 19:48 GMT+03:00 Denis Magda <dm...@gridgain.com>:
>>>>>> Alexander,
>>>>>> 
>>>>>> As I guess if we have a key without a class then it will be constructed using a BinaryBuilder instance and it’s user responsibility to set the has code at the end with BinaryBuilder.hasCode method. Sure, all this cases must be well-documented in both Java Doc API and Apache Ignite documentation.
>>>>>> 
>>>>>> —
>>>>>> Denis
>>>>>> 
>>>>>>> On Sep 28, 2016, at 9:33 AM, Alexander Paschenko <al...@gmail.com> wrote:
>>>>>>> 
>>>>>>> Dmitry, Denis,
>>>>>>> 
>>>>>>> OK, but I think it's necessary to address also the cases when there's
>>>>>>> no actual class for the key, and its fields are simply declared in
>>>>>>> XML. In this case, there are no fields to be marked transient. What do
>>>>>>> we do then? List transient fields in XML separately?
>>>>>>> 
>>>>>>> - Alex
>>>>>>> 
>>>>>>> 2016-09-28 4:15 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:
>>>>>>>> Agree with Denis.
>>>>>>>> 
>>>>>>>> - by default, all non-transient key fields should participate in the
>>>>>>>> hashcode generation
>>>>>>>> - when working on DDL, then the primary key fields should participate in
>>>>>>>> the hashcode
>>>>>>>> - we should add a resolver to override the default behavior (please
>>>>>>>> propose the interface in Jira)
>>>>>>>> - we should print out a warning, only once per type, the the hashcode
>>>>>>>> has been automatically generated based on which fields and which formula
>>>>>>>> 
>>>>>>>> D.
>>>>>>>> 
>>>>>>>> On Tue, Sep 27, 2016 at 5:42 PM, Denis Magda <dm...@gridgain.com> wrote:
>>>>>>>> 
>>>>>>>>> Hi Alexander,
>>>>>>>>> 
>>>>>>>>> Vladimir’s proposal sounds reasonable to me. However we must keep in mind
>>>>>>>>> one important thing. Binary objects were designed to address the following
>>>>>>>>> disadvantages a regular serializer, like optimized marshaller, has:
>>>>>>>>> necessity to deserialize an object on a server side every time it’s needed.
>>>>>>>>> necessity to hold an object in both serialized and deserialized forms on
>>>>>>>>> the server node.
>>>>>>>>> necessity to restart the whole cluster each time an object version is
>>>>>>>>> changed (new field is added or an old one is removed).
>>>>>>>>> If it will be needed to perform step 3 for a default implementation of the
>>>>>>>>> binary resolver just because the resolver has to consider new fields or
>>>>>>>>> ignore old ones then such an implementation sucks. Overall, the default
>>>>>>>>> implementation should use the reflection coming over all the fields a key
>>>>>>>>> has ignoring the ones that are marked with “transient” keyword. If a user
>>>>>>>>> wants to control the default resolver's logic then he can label all the
>>>>>>>>> fields that mustn’t be of a final has code value with “transient”. This has
>>>>>>>>> to be well-documented for sure.
>>>>>>>>> 
>>>>>>>>> Makes sense?
>>>>>>>>> 
>>>>>>>>> —
>>>>>>>>> Denis
>>>>>>>>> 
>>>>>>>>>> On Sep 26, 2016, at 12:40 PM, Alexander Paschenko <
>>>>>>>>> alexander.a.paschenko@gmail.com> wrote:
>>>>>>>>>> 
>>>>>>>>>> Hello Igniters,
>>>>>>>>>> 
>>>>>>>>>> As DML support is near, it's critical that we agree on how we generate
>>>>>>>>>> hash codes for new keys in presence of binary marshaller. Actually,
>>>>>>>>>> this discussion isn't new - please see its beginning here:
>>>>>>>>>> 
>>>>>>>>>> http://apache-ignite-developers.2346864.n4.nabble.
>>>>>>>>> com/All-BinaryObjects-created-by-BinaryObjectBuilder-stored-
>>>>>>>>> at-the-same-partition-by-default-td8042.html
>>>>>>>>>> 
>>>>>>>>>> Still, I'm creating this new thread to make getting to the final
>>>>>>>>>> solution as simple and fast as possible.
>>>>>>>>>> 
>>>>>>>>>> I remind everyone that the approach that has got the least critics was
>>>>>>>>>> the one proposed by Vladimir Ozerov:
>>>>>>>>>> 
>>>>>>>>>> <quote>
>>>>>>>>>> I think we can do the following:
>>>>>>>>>> 1) Add "has hash code" flag as Denis suggested.
>>>>>>>>>> 2) If object without a hash code is put to cache, throw an exception.
>>>>>>>>>> 3) Add *BinaryEqualsHashCodeResolver *interface.
>>>>>>>>>> 4) Add default implementation which will auto-generate hash code. *Print
>>>>>>>>> a
>>>>>>>>>> warning when auto-generation occurs*, so that user is aware that he is
>>>>>>>>>> likely to have problems with normal GETs/PUTs.
>>>>>>>>>> 5) Add another implementation which will use encoded string to calculate
>>>>>>>>> a
>>>>>>>>>> hash code. E.g. *new BinaryEqualsHashCodeResolver("{a} * 31 + {b}")*.
>>>>>>>>>> Originally proposed by Yakov some time ago.
>>>>>>>>>> </quote>
>>>>>>>>>> 
>>>>>>>>>> After that, Sergi suggested that instead of a "formula" we keep just a
>>>>>>>>>> list of the "fields" that participate in hash code evaluation, and
>>>>>>>>>> with that list, we simply calculate hash code just like IDE does -
>>>>>>>>>> with all its bit shifts and additions.
>>>>>>>>>> 
>>>>>>>>>> I'm planning on settling down with this combined Vlad-Sergi approach.
>>>>>>>>>> Any objections?
>>>>>>>>>> 
>>>>>>>>>> And an extra question I have: Vlad, you suggest that we both throw an
>>>>>>>>>> exception on cache code absence and that we might generate it as the
>>>>>>>>>> last resort. Do I understand you correctly that you suggest generating
>>>>>>>>>> random code only in context of SQL, but throw exception for keys
>>>>>>>>>> without codes on ordinary put?
>>>>>>>>>> 
>>>>>>>>>> And yes, built-in hash codes for JDK types are supported as well as
>>>>>>>>>> items 1-2 from Vlad's list (there's already fixed issue of IGNITE-3633
>>>>>>>>>> for the flag and its presence check).
>>>>>>>>>> 
>>>>>>>>>> - Alex
>>>>>>>>> 
>>>>>>>>> 
>>>>>> 
>>>> 
>>

Re: Default hash code generation strategy for new binary objects

Posted by Alexander Paschenko <al...@gmail.com>.

Also MERGE.

2016-09-29 2:10 GMT+03:00 Denis Magda <dm...@gridgain.com>:
> You need a hash code only for INSERT operation, right?
>
> —
> Denis
>
>> On Sep 28, 2016, at 3:47 PM, Alexander Paschenko <al...@gmail.com> wrote:
>>
>> But what if the user works from some kind of console and just types
>> the queries as text in full and does not bind params via JDBC or
>> something alike? What if there's no binary object? I don't see why we
>> should keep the user from usual cache gets in this case. I really like
>> the idea of supplying the values of distinct fields, thus freeing the
>> user of the need to mess with objects and builders, AND then just
>> calculating hash code as suggested before - say, via explicitly
>> listing participating fields in XML or by marking them with transient
>> keyword or some annotation.
>> Actually, I believe that's the only case when we need to generate any
>> hash codes - when the class is present, we can just get hash code from
>> its implementation of its method. When there's no class, we generate.
>> And all that is solely for SQL. For the rest - just throw an exception
>> when there's no hash code manually set for binary object. I don't see
>> why we should try to generate anything when the user already is using
>> Ignite in full, not just via limited interface of SQL.
>>
>> 2016-09-29 0:31 GMT+03:00 Denis Magda <dm...@gridgain.com>:
>>> Hmm, this is a good question.
>>>
>>> If a user doesn’t provide a _key when an INSERT is executed for me it means that he is not going to use the key later for cache.get/put, DELETE, UPDATE and other possible operation simply because he doesn’t know how to reconstruct the key back in his code. If he wants to use the primary key in the rest of operations then he must provide it at INSERT time.
>>>
>>> Do we need this key only for a case when an object is being inserted into a cache? If it’s so I would auto-generate a key using ‘long’ as a key type. I do remember that we provided the auto-generation for Spark module in a some way that may be useful here.
>>>
>>> —
>>> Denis
>>>
>>>> On Sep 28, 2016, at 9:53 AM, Alexander Paschenko <al...@gmail.com> wrote:
>>>>
>>>> Denis,
>>>>
>>>> That's not what I was asking about.
>>>> Currently DML implementation allows for dymanic instantiation of keys,
>>>> in other words, user does not have to provide value for object-typed
>>>> _key column - instead, he may supply just field values based on which
>>>> _key will be dynamically instantiated/binary built. And that's the
>>>> whole point of this discussion as I see it: what to do when we've
>>>> binary built classless key that we build ourselves inside SQL engine
>>>> and don't know how to compute hash code for it?
>>>>
>>>> - Alex
>>>>
>>>> 2016-09-28 19:48 GMT+03:00 Denis Magda <dm...@gridgain.com>:
>>>>> Alexander,
>>>>>
>>>>> As I guess if we have a key without a class then it will be constructed using a BinaryBuilder instance and it’s user responsibility to set the has code at the end with BinaryBuilder.hasCode method. Sure, all this cases must be well-documented in both Java Doc API and Apache Ignite documentation.
>>>>>
>>>>> —
>>>>> Denis
>>>>>
>>>>>> On Sep 28, 2016, at 9:33 AM, Alexander Paschenko <al...@gmail.com> wrote:
>>>>>>
>>>>>> Dmitry, Denis,
>>>>>>
>>>>>> OK, but I think it's necessary to address also the cases when there's
>>>>>> no actual class for the key, and its fields are simply declared in
>>>>>> XML. In this case, there are no fields to be marked transient. What do
>>>>>> we do then? List transient fields in XML separately?
>>>>>>
>>>>>> - Alex
>>>>>>
>>>>>> 2016-09-28 4:15 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:
>>>>>>> Agree with Denis.
>>>>>>>
>>>>>>> - by default, all non-transient key fields should participate in the
>>>>>>> hashcode generation
>>>>>>> - when working on DDL, then the primary key fields should participate in
>>>>>>> the hashcode
>>>>>>> - we should add a resolver to override the default behavior (please
>>>>>>> propose the interface in Jira)
>>>>>>> - we should print out a warning, only once per type, the the hashcode
>>>>>>> has been automatically generated based on which fields and which formula
>>>>>>>
>>>>>>> D.
>>>>>>>
>>>>>>> On Tue, Sep 27, 2016 at 5:42 PM, Denis Magda <dm...@gridgain.com> wrote:
>>>>>>>
>>>>>>>> Hi Alexander,
>>>>>>>>
>>>>>>>> Vladimir’s proposal sounds reasonable to me. However we must keep in mind
>>>>>>>> one important thing. Binary objects were designed to address the following
>>>>>>>> disadvantages a regular serializer, like optimized marshaller, has:
>>>>>>>> necessity to deserialize an object on a server side every time it’s needed.
>>>>>>>> necessity to hold an object in both serialized and deserialized forms on
>>>>>>>> the server node.
>>>>>>>> necessity to restart the whole cluster each time an object version is
>>>>>>>> changed (new field is added or an old one is removed).
>>>>>>>> If it will be needed to perform step 3 for a default implementation of the
>>>>>>>> binary resolver just because the resolver has to consider new fields or
>>>>>>>> ignore old ones then such an implementation sucks. Overall, the default
>>>>>>>> implementation should use the reflection coming over all the fields a key
>>>>>>>> has ignoring the ones that are marked with “transient” keyword. If a user
>>>>>>>> wants to control the default resolver's logic then he can label all the
>>>>>>>> fields that mustn’t be of a final has code value with “transient”. This has
>>>>>>>> to be well-documented for sure.
>>>>>>>>
>>>>>>>> Makes sense?
>>>>>>>>
>>>>>>>> —
>>>>>>>> Denis
>>>>>>>>
>>>>>>>>> On Sep 26, 2016, at 12:40 PM, Alexander Paschenko <
>>>>>>>> alexander.a.paschenko@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>> Hello Igniters,
>>>>>>>>>
>>>>>>>>> As DML support is near, it's critical that we agree on how we generate
>>>>>>>>> hash codes for new keys in presence of binary marshaller. Actually,
>>>>>>>>> this discussion isn't new - please see its beginning here:
>>>>>>>>>
>>>>>>>>> http://apache-ignite-developers.2346864.n4.nabble.
>>>>>>>> com/All-BinaryObjects-created-by-BinaryObjectBuilder-stored-
>>>>>>>> at-the-same-partition-by-default-td8042.html
>>>>>>>>>
>>>>>>>>> Still, I'm creating this new thread to make getting to the final
>>>>>>>>> solution as simple and fast as possible.
>>>>>>>>>
>>>>>>>>> I remind everyone that the approach that has got the least critics was
>>>>>>>>> the one proposed by Vladimir Ozerov:
>>>>>>>>>
>>>>>>>>> <quote>
>>>>>>>>> I think we can do the following:
>>>>>>>>> 1) Add "has hash code" flag as Denis suggested.
>>>>>>>>> 2) If object without a hash code is put to cache, throw an exception.
>>>>>>>>> 3) Add *BinaryEqualsHashCodeResolver *interface.
>>>>>>>>> 4) Add default implementation which will auto-generate hash code. *Print
>>>>>>>> a
>>>>>>>>> warning when auto-generation occurs*, so that user is aware that he is
>>>>>>>>> likely to have problems with normal GETs/PUTs.
>>>>>>>>> 5) Add another implementation which will use encoded string to calculate
>>>>>>>> a
>>>>>>>>> hash code. E.g. *new BinaryEqualsHashCodeResolver("{a} * 31 + {b}")*.
>>>>>>>>> Originally proposed by Yakov some time ago.
>>>>>>>>> </quote>
>>>>>>>>>
>>>>>>>>> After that, Sergi suggested that instead of a "formula" we keep just a
>>>>>>>>> list of the "fields" that participate in hash code evaluation, and
>>>>>>>>> with that list, we simply calculate hash code just like IDE does -
>>>>>>>>> with all its bit shifts and additions.
>>>>>>>>>
>>>>>>>>> I'm planning on settling down with this combined Vlad-Sergi approach.
>>>>>>>>> Any objections?
>>>>>>>>>
>>>>>>>>> And an extra question I have: Vlad, you suggest that we both throw an
>>>>>>>>> exception on cache code absence and that we might generate it as the
>>>>>>>>> last resort. Do I understand you correctly that you suggest generating
>>>>>>>>> random code only in context of SQL, but throw exception for keys
>>>>>>>>> without codes on ordinary put?
>>>>>>>>>
>>>>>>>>> And yes, built-in hash codes for JDK types are supported as well as
>>>>>>>>> items 1-2 from Vlad's list (there's already fixed issue of IGNITE-3633
>>>>>>>>> for the flag and its presence check).
>>>>>>>>>
>>>>>>>>> - Alex
>>>>>>>>
>>>>>>>>
>>>>>
>>>
>

Re: Default hash code generation strategy for new binary objects

Posted by Denis Magda <dm...@gridgain.com>.

You need a hash code only for INSERT operation, right?

—
Denis

> On Sep 28, 2016, at 3:47 PM, Alexander Paschenko <al...@gmail.com> wrote:
> 
> But what if the user works from some kind of console and just types
> the queries as text in full and does not bind params via JDBC or
> something alike? What if there's no binary object? I don't see why we
> should keep the user from usual cache gets in this case. I really like
> the idea of supplying the values of distinct fields, thus freeing the
> user of the need to mess with objects and builders, AND then just
> calculating hash code as suggested before - say, via explicitly
> listing participating fields in XML or by marking them with transient
> keyword or some annotation.
> Actually, I believe that's the only case when we need to generate any
> hash codes - when the class is present, we can just get hash code from
> its implementation of its method. When there's no class, we generate.
> And all that is solely for SQL. For the rest - just throw an exception
> when there's no hash code manually set for binary object. I don't see
> why we should try to generate anything when the user already is using
> Ignite in full, not just via limited interface of SQL.
> 
> 2016-09-29 0:31 GMT+03:00 Denis Magda <dm...@gridgain.com>:
>> Hmm, this is a good question.
>> 
>> If a user doesn’t provide a _key when an INSERT is executed for me it means that he is not going to use the key later for cache.get/put, DELETE, UPDATE and other possible operation simply because he doesn’t know how to reconstruct the key back in his code. If he wants to use the primary key in the rest of operations then he must provide it at INSERT time.
>> 
>> Do we need this key only for a case when an object is being inserted into a cache? If it’s so I would auto-generate a key using ‘long’ as a key type. I do remember that we provided the auto-generation for Spark module in a some way that may be useful here.
>> 
>> —
>> Denis
>> 
>>> On Sep 28, 2016, at 9:53 AM, Alexander Paschenko <al...@gmail.com> wrote:
>>> 
>>> Denis,
>>> 
>>> That's not what I was asking about.
>>> Currently DML implementation allows for dymanic instantiation of keys,
>>> in other words, user does not have to provide value for object-typed
>>> _key column - instead, he may supply just field values based on which
>>> _key will be dynamically instantiated/binary built. And that's the
>>> whole point of this discussion as I see it: what to do when we've
>>> binary built classless key that we build ourselves inside SQL engine
>>> and don't know how to compute hash code for it?
>>> 
>>> - Alex
>>> 
>>> 2016-09-28 19:48 GMT+03:00 Denis Magda <dm...@gridgain.com>:
>>>> Alexander,
>>>> 
>>>> As I guess if we have a key without a class then it will be constructed using a BinaryBuilder instance and it’s user responsibility to set the has code at the end with BinaryBuilder.hasCode method. Sure, all this cases must be well-documented in both Java Doc API and Apache Ignite documentation.
>>>> 
>>>> —
>>>> Denis
>>>> 
>>>>> On Sep 28, 2016, at 9:33 AM, Alexander Paschenko <al...@gmail.com> wrote:
>>>>> 
>>>>> Dmitry, Denis,
>>>>> 
>>>>> OK, but I think it's necessary to address also the cases when there's
>>>>> no actual class for the key, and its fields are simply declared in
>>>>> XML. In this case, there are no fields to be marked transient. What do
>>>>> we do then? List transient fields in XML separately?
>>>>> 
>>>>> - Alex
>>>>> 
>>>>> 2016-09-28 4:15 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:
>>>>>> Agree with Denis.
>>>>>> 
>>>>>> - by default, all non-transient key fields should participate in the
>>>>>> hashcode generation
>>>>>> - when working on DDL, then the primary key fields should participate in
>>>>>> the hashcode
>>>>>> - we should add a resolver to override the default behavior (please
>>>>>> propose the interface in Jira)
>>>>>> - we should print out a warning, only once per type, the the hashcode
>>>>>> has been automatically generated based on which fields and which formula
>>>>>> 
>>>>>> D.
>>>>>> 
>>>>>> On Tue, Sep 27, 2016 at 5:42 PM, Denis Magda <dm...@gridgain.com> wrote:
>>>>>> 
>>>>>>> Hi Alexander,
>>>>>>> 
>>>>>>> Vladimir’s proposal sounds reasonable to me. However we must keep in mind
>>>>>>> one important thing. Binary objects were designed to address the following
>>>>>>> disadvantages a regular serializer, like optimized marshaller, has:
>>>>>>> necessity to deserialize an object on a server side every time it’s needed.
>>>>>>> necessity to hold an object in both serialized and deserialized forms on
>>>>>>> the server node.
>>>>>>> necessity to restart the whole cluster each time an object version is
>>>>>>> changed (new field is added or an old one is removed).
>>>>>>> If it will be needed to perform step 3 for a default implementation of the
>>>>>>> binary resolver just because the resolver has to consider new fields or
>>>>>>> ignore old ones then such an implementation sucks. Overall, the default
>>>>>>> implementation should use the reflection coming over all the fields a key
>>>>>>> has ignoring the ones that are marked with “transient” keyword. If a user
>>>>>>> wants to control the default resolver's logic then he can label all the
>>>>>>> fields that mustn’t be of a final has code value with “transient”. This has
>>>>>>> to be well-documented for sure.
>>>>>>> 
>>>>>>> Makes sense?
>>>>>>> 
>>>>>>> —
>>>>>>> Denis
>>>>>>> 
>>>>>>>> On Sep 26, 2016, at 12:40 PM, Alexander Paschenko <
>>>>>>> alexander.a.paschenko@gmail.com> wrote:
>>>>>>>> 
>>>>>>>> Hello Igniters,
>>>>>>>> 
>>>>>>>> As DML support is near, it's critical that we agree on how we generate
>>>>>>>> hash codes for new keys in presence of binary marshaller. Actually,
>>>>>>>> this discussion isn't new - please see its beginning here:
>>>>>>>> 
>>>>>>>> http://apache-ignite-developers.2346864.n4.nabble.
>>>>>>> com/All-BinaryObjects-created-by-BinaryObjectBuilder-stored-
>>>>>>> at-the-same-partition-by-default-td8042.html
>>>>>>>> 
>>>>>>>> Still, I'm creating this new thread to make getting to the final
>>>>>>>> solution as simple and fast as possible.
>>>>>>>> 
>>>>>>>> I remind everyone that the approach that has got the least critics was
>>>>>>>> the one proposed by Vladimir Ozerov:
>>>>>>>> 
>>>>>>>> <quote>
>>>>>>>> I think we can do the following:
>>>>>>>> 1) Add "has hash code" flag as Denis suggested.
>>>>>>>> 2) If object without a hash code is put to cache, throw an exception.
>>>>>>>> 3) Add *BinaryEqualsHashCodeResolver *interface.
>>>>>>>> 4) Add default implementation which will auto-generate hash code. *Print
>>>>>>> a
>>>>>>>> warning when auto-generation occurs*, so that user is aware that he is
>>>>>>>> likely to have problems with normal GETs/PUTs.
>>>>>>>> 5) Add another implementation which will use encoded string to calculate
>>>>>>> a
>>>>>>>> hash code. E.g. *new BinaryEqualsHashCodeResolver("{a} * 31 + {b}")*.
>>>>>>>> Originally proposed by Yakov some time ago.
>>>>>>>> </quote>
>>>>>>>> 
>>>>>>>> After that, Sergi suggested that instead of a "formula" we keep just a
>>>>>>>> list of the "fields" that participate in hash code evaluation, and
>>>>>>>> with that list, we simply calculate hash code just like IDE does -
>>>>>>>> with all its bit shifts and additions.
>>>>>>>> 
>>>>>>>> I'm planning on settling down with this combined Vlad-Sergi approach.
>>>>>>>> Any objections?
>>>>>>>> 
>>>>>>>> And an extra question I have: Vlad, you suggest that we both throw an
>>>>>>>> exception on cache code absence and that we might generate it as the
>>>>>>>> last resort. Do I understand you correctly that you suggest generating
>>>>>>>> random code only in context of SQL, but throw exception for keys
>>>>>>>> without codes on ordinary put?
>>>>>>>> 
>>>>>>>> And yes, built-in hash codes for JDK types are supported as well as
>>>>>>>> items 1-2 from Vlad's list (there's already fixed issue of IGNITE-3633
>>>>>>>> for the flag and its presence check).
>>>>>>>> 
>>>>>>>> - Alex
>>>>>>> 
>>>>>>> 
>>>> 
>>

Re: Default hash code generation strategy for new binary objects

Posted by Alexander Paschenko <al...@gmail.com>.

But what if the user works from some kind of console and just types
the queries as text in full and does not bind params via JDBC or
something alike? What if there's no binary object? I don't see why we
should keep the user from usual cache gets in this case. I really like
the idea of supplying the values of distinct fields, thus freeing the
user of the need to mess with objects and builders, AND then just
calculating hash code as suggested before - say, via explicitly
listing participating fields in XML or by marking them with transient
keyword or some annotation.
Actually, I believe that's the only case when we need to generate any
hash codes - when the class is present, we can just get hash code from
its implementation of its method. When there's no class, we generate.
And all that is solely for SQL. For the rest - just throw an exception
when there's no hash code manually set for binary object. I don't see
why we should try to generate anything when the user already is using
Ignite in full, not just via limited interface of SQL.

2016-09-29 0:31 GMT+03:00 Denis Magda <dm...@gridgain.com>:
> Hmm, this is a good question.
>
> If a user doesn’t provide a _key when an INSERT is executed for me it means that he is not going to use the key later for cache.get/put, DELETE, UPDATE and other possible operation simply because he doesn’t know how to reconstruct the key back in his code. If he wants to use the primary key in the rest of operations then he must provide it at INSERT time.
>
> Do we need this key only for a case when an object is being inserted into a cache? If it’s so I would auto-generate a key using ‘long’ as a key type. I do remember that we provided the auto-generation for Spark module in a some way that may be useful here.
>
> —
> Denis
>
>> On Sep 28, 2016, at 9:53 AM, Alexander Paschenko <al...@gmail.com> wrote:
>>
>> Denis,
>>
>> That's not what I was asking about.
>> Currently DML implementation allows for dymanic instantiation of keys,
>> in other words, user does not have to provide value for object-typed
>> _key column - instead, he may supply just field values based on which
>> _key will be dynamically instantiated/binary built. And that's the
>> whole point of this discussion as I see it: what to do when we've
>> binary built classless key that we build ourselves inside SQL engine
>> and don't know how to compute hash code for it?
>>
>> - Alex
>>
>> 2016-09-28 19:48 GMT+03:00 Denis Magda <dm...@gridgain.com>:
>>> Alexander,
>>>
>>> As I guess if we have a key without a class then it will be constructed using a BinaryBuilder instance and it’s user responsibility to set the has code at the end with BinaryBuilder.hasCode method. Sure, all this cases must be well-documented in both Java Doc API and Apache Ignite documentation.
>>>
>>> —
>>> Denis
>>>
>>>> On Sep 28, 2016, at 9:33 AM, Alexander Paschenko <al...@gmail.com> wrote:
>>>>
>>>> Dmitry, Denis,
>>>>
>>>> OK, but I think it's necessary to address also the cases when there's
>>>> no actual class for the key, and its fields are simply declared in
>>>> XML. In this case, there are no fields to be marked transient. What do
>>>> we do then? List transient fields in XML separately?
>>>>
>>>> - Alex
>>>>
>>>> 2016-09-28 4:15 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:
>>>>> Agree with Denis.
>>>>>
>>>>>  - by default, all non-transient key fields should participate in the
>>>>>  hashcode generation
>>>>>  - when working on DDL, then the primary key fields should participate in
>>>>>  the hashcode
>>>>>  - we should add a resolver to override the default behavior (please
>>>>>  propose the interface in Jira)
>>>>>  - we should print out a warning, only once per type, the the hashcode
>>>>>  has been automatically generated based on which fields and which formula
>>>>>
>>>>> D.
>>>>>
>>>>> On Tue, Sep 27, 2016 at 5:42 PM, Denis Magda <dm...@gridgain.com> wrote:
>>>>>
>>>>>> Hi Alexander,
>>>>>>
>>>>>> Vladimir’s proposal sounds reasonable to me. However we must keep in mind
>>>>>> one important thing. Binary objects were designed to address the following
>>>>>> disadvantages a regular serializer, like optimized marshaller, has:
>>>>>> necessity to deserialize an object on a server side every time it’s needed.
>>>>>> necessity to hold an object in both serialized and deserialized forms on
>>>>>> the server node.
>>>>>> necessity to restart the whole cluster each time an object version is
>>>>>> changed (new field is added or an old one is removed).
>>>>>> If it will be needed to perform step 3 for a default implementation of the
>>>>>> binary resolver just because the resolver has to consider new fields or
>>>>>> ignore old ones then such an implementation sucks. Overall, the default
>>>>>> implementation should use the reflection coming over all the fields a key
>>>>>> has ignoring the ones that are marked with “transient” keyword. If a user
>>>>>> wants to control the default resolver's logic then he can label all the
>>>>>> fields that mustn’t be of a final has code value with “transient”. This has
>>>>>> to be well-documented for sure.
>>>>>>
>>>>>> Makes sense?
>>>>>>
>>>>>> —
>>>>>> Denis
>>>>>>
>>>>>>> On Sep 26, 2016, at 12:40 PM, Alexander Paschenko <
>>>>>> alexander.a.paschenko@gmail.com> wrote:
>>>>>>>
>>>>>>> Hello Igniters,
>>>>>>>
>>>>>>> As DML support is near, it's critical that we agree on how we generate
>>>>>>> hash codes for new keys in presence of binary marshaller. Actually,
>>>>>>> this discussion isn't new - please see its beginning here:
>>>>>>>
>>>>>>> http://apache-ignite-developers.2346864.n4.nabble.
>>>>>> com/All-BinaryObjects-created-by-BinaryObjectBuilder-stored-
>>>>>> at-the-same-partition-by-default-td8042.html
>>>>>>>
>>>>>>> Still, I'm creating this new thread to make getting to the final
>>>>>>> solution as simple and fast as possible.
>>>>>>>
>>>>>>> I remind everyone that the approach that has got the least critics was
>>>>>>> the one proposed by Vladimir Ozerov:
>>>>>>>
>>>>>>> <quote>
>>>>>>> I think we can do the following:
>>>>>>> 1) Add "has hash code" flag as Denis suggested.
>>>>>>> 2) If object without a hash code is put to cache, throw an exception.
>>>>>>> 3) Add *BinaryEqualsHashCodeResolver *interface.
>>>>>>> 4) Add default implementation which will auto-generate hash code. *Print
>>>>>> a
>>>>>>> warning when auto-generation occurs*, so that user is aware that he is
>>>>>>> likely to have problems with normal GETs/PUTs.
>>>>>>> 5) Add another implementation which will use encoded string to calculate
>>>>>> a
>>>>>>> hash code. E.g. *new BinaryEqualsHashCodeResolver("{a} * 31 + {b}")*.
>>>>>>> Originally proposed by Yakov some time ago.
>>>>>>> </quote>
>>>>>>>
>>>>>>> After that, Sergi suggested that instead of a "formula" we keep just a
>>>>>>> list of the "fields" that participate in hash code evaluation, and
>>>>>>> with that list, we simply calculate hash code just like IDE does -
>>>>>>> with all its bit shifts and additions.
>>>>>>>
>>>>>>> I'm planning on settling down with this combined Vlad-Sergi approach.
>>>>>>> Any objections?
>>>>>>>
>>>>>>> And an extra question I have: Vlad, you suggest that we both throw an
>>>>>>> exception on cache code absence and that we might generate it as the
>>>>>>> last resort. Do I understand you correctly that you suggest generating
>>>>>>> random code only in context of SQL, but throw exception for keys
>>>>>>> without codes on ordinary put?
>>>>>>>
>>>>>>> And yes, built-in hash codes for JDK types are supported as well as
>>>>>>> items 1-2 from Vlad's list (there's already fixed issue of IGNITE-3633
>>>>>>> for the flag and its presence check).
>>>>>>>
>>>>>>> - Alex
>>>>>>
>>>>>>
>>>
>

Re: Default hash code generation strategy for new binary objects

Posted by Denis Magda <dm...@gridgain.com>.

Hmm, this is a good question.

If a user doesn’t provide a _key when an INSERT is executed for me it means that he is not going to use the key later for cache.get/put, DELETE, UPDATE and other possible operation simply because he doesn’t know how to reconstruct the key back in his code. If he wants to use the primary key in the rest of operations then he must provide it at INSERT time. 

Do we need this key only for a case when an object is being inserted into a cache? If it’s so I would auto-generate a key using ‘long’ as a key type. I do remember that we provided the auto-generation for Spark module in a some way that may be useful here.

—
Denis  
 
> On Sep 28, 2016, at 9:53 AM, Alexander Paschenko <al...@gmail.com> wrote:
> 
> Denis,
> 
> That's not what I was asking about.
> Currently DML implementation allows for dymanic instantiation of keys,
> in other words, user does not have to provide value for object-typed
> _key column - instead, he may supply just field values based on which
> _key will be dynamically instantiated/binary built. And that's the
> whole point of this discussion as I see it: what to do when we've
> binary built classless key that we build ourselves inside SQL engine
> and don't know how to compute hash code for it?
> 
> - Alex
> 
> 2016-09-28 19:48 GMT+03:00 Denis Magda <dm...@gridgain.com>:
>> Alexander,
>> 
>> As I guess if we have a key without a class then it will be constructed using a BinaryBuilder instance and it’s user responsibility to set the has code at the end with BinaryBuilder.hasCode method. Sure, all this cases must be well-documented in both Java Doc API and Apache Ignite documentation.
>> 
>> —
>> Denis
>> 
>>> On Sep 28, 2016, at 9:33 AM, Alexander Paschenko <al...@gmail.com> wrote:
>>> 
>>> Dmitry, Denis,
>>> 
>>> OK, but I think it's necessary to address also the cases when there's
>>> no actual class for the key, and its fields are simply declared in
>>> XML. In this case, there are no fields to be marked transient. What do
>>> we do then? List transient fields in XML separately?
>>> 
>>> - Alex
>>> 
>>> 2016-09-28 4:15 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:
>>>> Agree with Denis.
>>>> 
>>>>  - by default, all non-transient key fields should participate in the
>>>>  hashcode generation
>>>>  - when working on DDL, then the primary key fields should participate in
>>>>  the hashcode
>>>>  - we should add a resolver to override the default behavior (please
>>>>  propose the interface in Jira)
>>>>  - we should print out a warning, only once per type, the the hashcode
>>>>  has been automatically generated based on which fields and which formula
>>>> 
>>>> D.
>>>> 
>>>> On Tue, Sep 27, 2016 at 5:42 PM, Denis Magda <dm...@gridgain.com> wrote:
>>>> 
>>>>> Hi Alexander,
>>>>> 
>>>>> Vladimir’s proposal sounds reasonable to me. However we must keep in mind
>>>>> one important thing. Binary objects were designed to address the following
>>>>> disadvantages a regular serializer, like optimized marshaller, has:
>>>>> necessity to deserialize an object on a server side every time it’s needed.
>>>>> necessity to hold an object in both serialized and deserialized forms on
>>>>> the server node.
>>>>> necessity to restart the whole cluster each time an object version is
>>>>> changed (new field is added or an old one is removed).
>>>>> If it will be needed to perform step 3 for a default implementation of the
>>>>> binary resolver just because the resolver has to consider new fields or
>>>>> ignore old ones then such an implementation sucks. Overall, the default
>>>>> implementation should use the reflection coming over all the fields a key
>>>>> has ignoring the ones that are marked with “transient” keyword. If a user
>>>>> wants to control the default resolver's logic then he can label all the
>>>>> fields that mustn’t be of a final has code value with “transient”. This has
>>>>> to be well-documented for sure.
>>>>> 
>>>>> Makes sense?
>>>>> 
>>>>> —
>>>>> Denis
>>>>> 
>>>>>> On Sep 26, 2016, at 12:40 PM, Alexander Paschenko <
>>>>> alexander.a.paschenko@gmail.com> wrote:
>>>>>> 
>>>>>> Hello Igniters,
>>>>>> 
>>>>>> As DML support is near, it's critical that we agree on how we generate
>>>>>> hash codes for new keys in presence of binary marshaller. Actually,
>>>>>> this discussion isn't new - please see its beginning here:
>>>>>> 
>>>>>> http://apache-ignite-developers.2346864.n4.nabble.
>>>>> com/All-BinaryObjects-created-by-BinaryObjectBuilder-stored-
>>>>> at-the-same-partition-by-default-td8042.html
>>>>>> 
>>>>>> Still, I'm creating this new thread to make getting to the final
>>>>>> solution as simple and fast as possible.
>>>>>> 
>>>>>> I remind everyone that the approach that has got the least critics was
>>>>>> the one proposed by Vladimir Ozerov:
>>>>>> 
>>>>>> <quote>
>>>>>> I think we can do the following:
>>>>>> 1) Add "has hash code" flag as Denis suggested.
>>>>>> 2) If object without a hash code is put to cache, throw an exception.
>>>>>> 3) Add *BinaryEqualsHashCodeResolver *interface.
>>>>>> 4) Add default implementation which will auto-generate hash code. *Print
>>>>> a
>>>>>> warning when auto-generation occurs*, so that user is aware that he is
>>>>>> likely to have problems with normal GETs/PUTs.
>>>>>> 5) Add another implementation which will use encoded string to calculate
>>>>> a
>>>>>> hash code. E.g. *new BinaryEqualsHashCodeResolver("{a} * 31 + {b}")*.
>>>>>> Originally proposed by Yakov some time ago.
>>>>>> </quote>
>>>>>> 
>>>>>> After that, Sergi suggested that instead of a "formula" we keep just a
>>>>>> list of the "fields" that participate in hash code evaluation, and
>>>>>> with that list, we simply calculate hash code just like IDE does -
>>>>>> with all its bit shifts and additions.
>>>>>> 
>>>>>> I'm planning on settling down with this combined Vlad-Sergi approach.
>>>>>> Any objections?
>>>>>> 
>>>>>> And an extra question I have: Vlad, you suggest that we both throw an
>>>>>> exception on cache code absence and that we might generate it as the
>>>>>> last resort. Do I understand you correctly that you suggest generating
>>>>>> random code only in context of SQL, but throw exception for keys
>>>>>> without codes on ordinary put?
>>>>>> 
>>>>>> And yes, built-in hash codes for JDK types are supported as well as
>>>>>> items 1-2 from Vlad's list (there's already fixed issue of IGNITE-3633
>>>>>> for the flag and its presence check).
>>>>>> 
>>>>>> - Alex
>>>>> 
>>>>> 
>>

Re: Default hash code generation strategy for new binary objects

Posted by Alexander Paschenko <al...@gmail.com>.

Denis,

That's not what I was asking about.
Currently DML implementation allows for dymanic instantiation of keys,
in other words, user does not have to provide value for object-typed
_key column - instead, he may supply just field values based on which
_key will be dynamically instantiated/binary built. And that's the
whole point of this discussion as I see it: what to do when we've
binary built classless key that we build ourselves inside SQL engine
and don't know how to compute hash code for it?

- Alex

2016-09-28 19:48 GMT+03:00 Denis Magda <dm...@gridgain.com>:
> Alexander,
>
> As I guess if we have a key without a class then it will be constructed using a BinaryBuilder instance and it’s user responsibility to set the has code at the end with BinaryBuilder.hasCode method. Sure, all this cases must be well-documented in both Java Doc API and Apache Ignite documentation.
>
> —
> Denis
>
>> On Sep 28, 2016, at 9:33 AM, Alexander Paschenko <al...@gmail.com> wrote:
>>
>> Dmitry, Denis,
>>
>> OK, but I think it's necessary to address also the cases when there's
>> no actual class for the key, and its fields are simply declared in
>> XML. In this case, there are no fields to be marked transient. What do
>> we do then? List transient fields in XML separately?
>>
>> - Alex
>>
>> 2016-09-28 4:15 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:
>>> Agree with Denis.
>>>
>>>   - by default, all non-transient key fields should participate in the
>>>   hashcode generation
>>>   - when working on DDL, then the primary key fields should participate in
>>>   the hashcode
>>>   - we should add a resolver to override the default behavior (please
>>>   propose the interface in Jira)
>>>   - we should print out a warning, only once per type, the the hashcode
>>>   has been automatically generated based on which fields and which formula
>>>
>>> D.
>>>
>>> On Tue, Sep 27, 2016 at 5:42 PM, Denis Magda <dm...@gridgain.com> wrote:
>>>
>>>> Hi Alexander,
>>>>
>>>> Vladimir’s proposal sounds reasonable to me. However we must keep in mind
>>>> one important thing. Binary objects were designed to address the following
>>>> disadvantages a regular serializer, like optimized marshaller, has:
>>>> necessity to deserialize an object on a server side every time it’s needed.
>>>> necessity to hold an object in both serialized and deserialized forms on
>>>> the server node.
>>>> necessity to restart the whole cluster each time an object version is
>>>> changed (new field is added or an old one is removed).
>>>> If it will be needed to perform step 3 for a default implementation of the
>>>> binary resolver just because the resolver has to consider new fields or
>>>> ignore old ones then such an implementation sucks. Overall, the default
>>>> implementation should use the reflection coming over all the fields a key
>>>> has ignoring the ones that are marked with “transient” keyword. If a user
>>>> wants to control the default resolver's logic then he can label all the
>>>> fields that mustn’t be of a final has code value with “transient”. This has
>>>> to be well-documented for sure.
>>>>
>>>> Makes sense?
>>>>
>>>> —
>>>> Denis
>>>>
>>>>> On Sep 26, 2016, at 12:40 PM, Alexander Paschenko <
>>>> alexander.a.paschenko@gmail.com> wrote:
>>>>>
>>>>> Hello Igniters,
>>>>>
>>>>> As DML support is near, it's critical that we agree on how we generate
>>>>> hash codes for new keys in presence of binary marshaller. Actually,
>>>>> this discussion isn't new - please see its beginning here:
>>>>>
>>>>> http://apache-ignite-developers.2346864.n4.nabble.
>>>> com/All-BinaryObjects-created-by-BinaryObjectBuilder-stored-
>>>> at-the-same-partition-by-default-td8042.html
>>>>>
>>>>> Still, I'm creating this new thread to make getting to the final
>>>>> solution as simple and fast as possible.
>>>>>
>>>>> I remind everyone that the approach that has got the least critics was
>>>>> the one proposed by Vladimir Ozerov:
>>>>>
>>>>> <quote>
>>>>> I think we can do the following:
>>>>> 1) Add "has hash code" flag as Denis suggested.
>>>>> 2) If object without a hash code is put to cache, throw an exception.
>>>>> 3) Add *BinaryEqualsHashCodeResolver *interface.
>>>>> 4) Add default implementation which will auto-generate hash code. *Print
>>>> a
>>>>> warning when auto-generation occurs*, so that user is aware that he is
>>>>> likely to have problems with normal GETs/PUTs.
>>>>> 5) Add another implementation which will use encoded string to calculate
>>>> a
>>>>> hash code. E.g. *new BinaryEqualsHashCodeResolver("{a} * 31 + {b}")*.
>>>>> Originally proposed by Yakov some time ago.
>>>>> </quote>
>>>>>
>>>>> After that, Sergi suggested that instead of a "formula" we keep just a
>>>>> list of the "fields" that participate in hash code evaluation, and
>>>>> with that list, we simply calculate hash code just like IDE does -
>>>>> with all its bit shifts and additions.
>>>>>
>>>>> I'm planning on settling down with this combined Vlad-Sergi approach.
>>>>> Any objections?
>>>>>
>>>>> And an extra question I have: Vlad, you suggest that we both throw an
>>>>> exception on cache code absence and that we might generate it as the
>>>>> last resort. Do I understand you correctly that you suggest generating
>>>>> random code only in context of SQL, but throw exception for keys
>>>>> without codes on ordinary put?
>>>>>
>>>>> And yes, built-in hash codes for JDK types are supported as well as
>>>>> items 1-2 from Vlad's list (there's already fixed issue of IGNITE-3633
>>>>> for the flag and its presence check).
>>>>>
>>>>> - Alex
>>>>
>>>>
>

Re: Default hash code generation strategy for new binary objects

Posted by Denis Magda <dm...@gridgain.com>.

Alexander,

As I guess if we have a key without a class then it will be constructed using a BinaryBuilder instance and it’s user responsibility to set the has code at the end with BinaryBuilder.hasCode method. Sure, all this cases must be well-documented in both Java Doc API and Apache Ignite documentation.

—
Denis

> On Sep 28, 2016, at 9:33 AM, Alexander Paschenko <al...@gmail.com> wrote:
> 
> Dmitry, Denis,
> 
> OK, but I think it's necessary to address also the cases when there's
> no actual class for the key, and its fields are simply declared in
> XML. In this case, there are no fields to be marked transient. What do
> we do then? List transient fields in XML separately?
> 
> - Alex
> 
> 2016-09-28 4:15 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:
>> Agree with Denis.
>> 
>>   - by default, all non-transient key fields should participate in the
>>   hashcode generation
>>   - when working on DDL, then the primary key fields should participate in
>>   the hashcode
>>   - we should add a resolver to override the default behavior (please
>>   propose the interface in Jira)
>>   - we should print out a warning, only once per type, the the hashcode
>>   has been automatically generated based on which fields and which formula
>> 
>> D.
>> 
>> On Tue, Sep 27, 2016 at 5:42 PM, Denis Magda <dm...@gridgain.com> wrote:
>> 
>>> Hi Alexander,
>>> 
>>> Vladimir’s proposal sounds reasonable to me. However we must keep in mind
>>> one important thing. Binary objects were designed to address the following
>>> disadvantages a regular serializer, like optimized marshaller, has:
>>> necessity to deserialize an object on a server side every time it’s needed.
>>> necessity to hold an object in both serialized and deserialized forms on
>>> the server node.
>>> necessity to restart the whole cluster each time an object version is
>>> changed (new field is added or an old one is removed).
>>> If it will be needed to perform step 3 for a default implementation of the
>>> binary resolver just because the resolver has to consider new fields or
>>> ignore old ones then such an implementation sucks. Overall, the default
>>> implementation should use the reflection coming over all the fields a key
>>> has ignoring the ones that are marked with “transient” keyword. If a user
>>> wants to control the default resolver's logic then he can label all the
>>> fields that mustn’t be of a final has code value with “transient”. This has
>>> to be well-documented for sure.
>>> 
>>> Makes sense?
>>> 
>>> —
>>> Denis
>>> 
>>>> On Sep 26, 2016, at 12:40 PM, Alexander Paschenko <
>>> alexander.a.paschenko@gmail.com> wrote:
>>>> 
>>>> Hello Igniters,
>>>> 
>>>> As DML support is near, it's critical that we agree on how we generate
>>>> hash codes for new keys in presence of binary marshaller. Actually,
>>>> this discussion isn't new - please see its beginning here:
>>>> 
>>>> http://apache-ignite-developers.2346864.n4.nabble.
>>> com/All-BinaryObjects-created-by-BinaryObjectBuilder-stored-
>>> at-the-same-partition-by-default-td8042.html
>>>> 
>>>> Still, I'm creating this new thread to make getting to the final
>>>> solution as simple and fast as possible.
>>>> 
>>>> I remind everyone that the approach that has got the least critics was
>>>> the one proposed by Vladimir Ozerov:
>>>> 
>>>> <quote>
>>>> I think we can do the following:
>>>> 1) Add "has hash code" flag as Denis suggested.
>>>> 2) If object without a hash code is put to cache, throw an exception.
>>>> 3) Add *BinaryEqualsHashCodeResolver *interface.
>>>> 4) Add default implementation which will auto-generate hash code. *Print
>>> a
>>>> warning when auto-generation occurs*, so that user is aware that he is
>>>> likely to have problems with normal GETs/PUTs.
>>>> 5) Add another implementation which will use encoded string to calculate
>>> a
>>>> hash code. E.g. *new BinaryEqualsHashCodeResolver("{a} * 31 + {b}")*.
>>>> Originally proposed by Yakov some time ago.
>>>> </quote>
>>>> 
>>>> After that, Sergi suggested that instead of a "formula" we keep just a
>>>> list of the "fields" that participate in hash code evaluation, and
>>>> with that list, we simply calculate hash code just like IDE does -
>>>> with all its bit shifts and additions.
>>>> 
>>>> I'm planning on settling down with this combined Vlad-Sergi approach.
>>>> Any objections?
>>>> 
>>>> And an extra question I have: Vlad, you suggest that we both throw an
>>>> exception on cache code absence and that we might generate it as the
>>>> last resort. Do I understand you correctly that you suggest generating
>>>> random code only in context of SQL, but throw exception for keys
>>>> without codes on ordinary put?
>>>> 
>>>> And yes, built-in hash codes for JDK types are supported as well as
>>>> items 1-2 from Vlad's list (there's already fixed issue of IGNITE-3633
>>>> for the flag and its presence check).
>>>> 
>>>> - Alex
>>> 
>>>

Re: Default hash code generation strategy for new binary objects

Posted by Alexander Paschenko <al...@gmail.com>.

Dmitry, Denis,

OK, but I think it's necessary to address also the cases when there's
no actual class for the key, and its fields are simply declared in
XML. In this case, there are no fields to be marked transient. What do
we do then? List transient fields in XML separately?

- Alex

2016-09-28 4:15 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:
> Agree with Denis.
>
>    - by default, all non-transient key fields should participate in the
>    hashcode generation
>    - when working on DDL, then the primary key fields should participate in
>    the hashcode
>    - we should add a resolver to override the default behavior (please
>    propose the interface in Jira)
>    - we should print out a warning, only once per type, the the hashcode
>    has been automatically generated based on which fields and which formula
>
> D.
>
> On Tue, Sep 27, 2016 at 5:42 PM, Denis Magda <dm...@gridgain.com> wrote:
>
>> Hi Alexander,
>>
>> Vladimir’s proposal sounds reasonable to me. However we must keep in mind
>> one important thing. Binary objects were designed to address the following
>> disadvantages a regular serializer, like optimized marshaller, has:
>> necessity to deserialize an object on a server side every time it’s needed.
>> necessity to hold an object in both serialized and deserialized forms on
>> the server node.
>> necessity to restart the whole cluster each time an object version is
>> changed (new field is added or an old one is removed).
>> If it will be needed to perform step 3 for a default implementation of the
>> binary resolver just because the resolver has to consider new fields or
>> ignore old ones then such an implementation sucks. Overall, the default
>> implementation should use the reflection coming over all the fields a key
>> has ignoring the ones that are marked with “transient” keyword. If a user
>> wants to control the default resolver's logic then he can label all the
>> fields that mustn’t be of a final has code value with “transient”. This has
>> to be well-documented for sure.
>>
>> Makes sense?
>>
>> —
>> Denis
>>
>> > On Sep 26, 2016, at 12:40 PM, Alexander Paschenko <
>> alexander.a.paschenko@gmail.com> wrote:
>> >
>> > Hello Igniters,
>> >
>> > As DML support is near, it's critical that we agree on how we generate
>> > hash codes for new keys in presence of binary marshaller. Actually,
>> > this discussion isn't new - please see its beginning here:
>> >
>> > http://apache-ignite-developers.2346864.n4.nabble.
>> com/All-BinaryObjects-created-by-BinaryObjectBuilder-stored-
>> at-the-same-partition-by-default-td8042.html
>> >
>> > Still, I'm creating this new thread to make getting to the final
>> > solution as simple and fast as possible.
>> >
>> > I remind everyone that the approach that has got the least critics was
>> > the one proposed by Vladimir Ozerov:
>> >
>> > <quote>
>> > I think we can do the following:
>> > 1) Add "has hash code" flag as Denis suggested.
>> > 2) If object without a hash code is put to cache, throw an exception.
>> > 3) Add *BinaryEqualsHashCodeResolver *interface.
>> > 4) Add default implementation which will auto-generate hash code. *Print
>> a
>> > warning when auto-generation occurs*, so that user is aware that he is
>> > likely to have problems with normal GETs/PUTs.
>> > 5) Add another implementation which will use encoded string to calculate
>> a
>> > hash code. E.g. *new BinaryEqualsHashCodeResolver("{a} * 31 + {b}")*.
>> > Originally proposed by Yakov some time ago.
>> > </quote>
>> >
>> > After that, Sergi suggested that instead of a "formula" we keep just a
>> > list of the "fields" that participate in hash code evaluation, and
>> > with that list, we simply calculate hash code just like IDE does -
>> > with all its bit shifts and additions.
>> >
>> > I'm planning on settling down with this combined Vlad-Sergi approach.
>> > Any objections?
>> >
>> > And an extra question I have: Vlad, you suggest that we both throw an
>> > exception on cache code absence and that we might generate it as the
>> > last resort. Do I understand you correctly that you suggest generating
>> > random code only in context of SQL, but throw exception for keys
>> > without codes on ordinary put?
>> >
>> > And yes, built-in hash codes for JDK types are supported as well as
>> > items 1-2 from Vlad's list (there's already fixed issue of IGNITE-3633
>> > for the flag and its presence check).
>> >
>> > - Alex
>>
>>

Re: Default hash code generation strategy for new binary objects

Posted by Dmitriy Setrakyan <ds...@apache.org>.

Agree with Denis.

   - by default, all non-transient key fields should participate in the
   hashcode generation
   - when working on DDL, then the primary key fields should participate in
   the hashcode
   - we should add a resolver to override the default behavior (please
   propose the interface in Jira)
   - we should print out a warning, only once per type, the the hashcode
   has been automatically generated based on which fields and which formula

D.

On Tue, Sep 27, 2016 at 5:42 PM, Denis Magda <dm...@gridgain.com> wrote:

> Hi Alexander,
>
> Vladimir’s proposal sounds reasonable to me. However we must keep in mind
> one important thing. Binary objects were designed to address the following
> disadvantages a regular serializer, like optimized marshaller, has:
> necessity to deserialize an object on a server side every time it’s needed.
> necessity to hold an object in both serialized and deserialized forms on
> the server node.
> necessity to restart the whole cluster each time an object version is
> changed (new field is added or an old one is removed).
> If it will be needed to perform step 3 for a default implementation of the
> binary resolver just because the resolver has to consider new fields or
> ignore old ones then such an implementation sucks. Overall, the default
> implementation should use the reflection coming over all the fields a key
> has ignoring the ones that are marked with “transient” keyword. If a user
> wants to control the default resolver's logic then he can label all the
> fields that mustn’t be of a final has code value with “transient”. This has
> to be well-documented for sure.
>
> Makes sense?
>
> —
> Denis
>
> > On Sep 26, 2016, at 12:40 PM, Alexander Paschenko <
> alexander.a.paschenko@gmail.com> wrote:
> >
> > Hello Igniters,
> >
> > As DML support is near, it's critical that we agree on how we generate
> > hash codes for new keys in presence of binary marshaller. Actually,
> > this discussion isn't new - please see its beginning here:
> >
> > http://apache-ignite-developers.2346864.n4.nabble.
> com/All-BinaryObjects-created-by-BinaryObjectBuilder-stored-
> at-the-same-partition-by-default-td8042.html
> >
> > Still, I'm creating this new thread to make getting to the final
> > solution as simple and fast as possible.
> >
> > I remind everyone that the approach that has got the least critics was
> > the one proposed by Vladimir Ozerov:
> >
> > <quote>
> > I think we can do the following:
> > 1) Add "has hash code" flag as Denis suggested.
> > 2) If object without a hash code is put to cache, throw an exception.
> > 3) Add *BinaryEqualsHashCodeResolver *interface.
> > 4) Add default implementation which will auto-generate hash code. *Print
> a
> > warning when auto-generation occurs*, so that user is aware that he is
> > likely to have problems with normal GETs/PUTs.
> > 5) Add another implementation which will use encoded string to calculate
> a
> > hash code. E.g. *new BinaryEqualsHashCodeResolver("{a} * 31 + {b}")*.
> > Originally proposed by Yakov some time ago.
> > </quote>
> >
> > After that, Sergi suggested that instead of a "formula" we keep just a
> > list of the "fields" that participate in hash code evaluation, and
> > with that list, we simply calculate hash code just like IDE does -
> > with all its bit shifts and additions.
> >
> > I'm planning on settling down with this combined Vlad-Sergi approach.
> > Any objections?
> >
> > And an extra question I have: Vlad, you suggest that we both throw an
> > exception on cache code absence and that we might generate it as the
> > last resort. Do I understand you correctly that you suggest generating
> > random code only in context of SQL, but throw exception for keys
> > without codes on ordinary put?
> >
> > And yes, built-in hash codes for JDK types are supported as well as
> > items 1-2 from Vlad's list (there's already fixed issue of IGNITE-3633
> > for the flag and its presence check).
> >
> > - Alex
>
>

Re: Default hash code generation strategy for new binary objects

Posted by Denis Magda <dm...@gridgain.com>.

Hi Alexander,

Vladimir’s proposal sounds reasonable to me. However we must keep in mind one important thing. Binary objects were designed to address the following disadvantages a regular serializer, like optimized marshaller, has:
necessity to deserialize an object on a server side every time it’s needed.
necessity to hold an object in both serialized and deserialized forms on the server node.
necessity to restart the whole cluster each time an object version is changed (new field is added or an old one is removed).
If it will be needed to perform step 3 for a default implementation of the binary resolver just because the resolver has to consider new fields or ignore old ones then such an implementation sucks. Overall, the default implementation should use the reflection coming over all the fields a key has ignoring the ones that are marked with “transient” keyword. If a user wants to control the default resolver's logic then he can label all the fields that mustn’t be of a final has code value with “transient”. This has to be well-documented for sure.

Makes sense?

—
Denis

> On Sep 26, 2016, at 12:40 PM, Alexander Paschenko <al...@gmail.com> wrote:
> 
> Hello Igniters,
> 
> As DML support is near, it's critical that we agree on how we generate
> hash codes for new keys in presence of binary marshaller. Actually,
> this discussion isn't new - please see its beginning here:
> 
> http://apache-ignite-developers.2346864.n4.nabble.com/All-BinaryObjects-created-by-BinaryObjectBuilder-stored-at-the-same-partition-by-default-td8042.html
> 
> Still, I'm creating this new thread to make getting to the final
> solution as simple and fast as possible.
> 
> I remind everyone that the approach that has got the least critics was
> the one proposed by Vladimir Ozerov:
> 
> <quote>
> I think we can do the following:
> 1) Add "has hash code" flag as Denis suggested.
> 2) If object without a hash code is put to cache, throw an exception.
> 3) Add *BinaryEqualsHashCodeResolver *interface.
> 4) Add default implementation which will auto-generate hash code. *Print a
> warning when auto-generation occurs*, so that user is aware that he is
> likely to have problems with normal GETs/PUTs.
> 5) Add another implementation which will use encoded string to calculate a
> hash code. E.g. *new BinaryEqualsHashCodeResolver("{a} * 31 + {b}")*.
> Originally proposed by Yakov some time ago.
> </quote>
> 
> After that, Sergi suggested that instead of a "formula" we keep just a
> list of the "fields" that participate in hash code evaluation, and
> with that list, we simply calculate hash code just like IDE does -
> with all its bit shifts and additions.
> 
> I'm planning on settling down with this combined Vlad-Sergi approach.
> Any objections?
> 
> And an extra question I have: Vlad, you suggest that we both throw an
> exception on cache code absence and that we might generate it as the
> last resort. Do I understand you correctly that you suggest generating
> random code only in context of SQL, but throw exception for keys
> without codes on ordinary put?
> 
> And yes, built-in hash codes for JDK types are supported as well as
> items 1-2 from Vlad's list (there's already fixed issue of IGNITE-3633
> for the flag and its presence check).
> 
> - Alex