You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Anton Kalashnikov (Jira)" <ji...@apache.org> on 2020/05/27 10:36:00 UTC

[jira] [Updated] (IGNITE-13080) Incorrect hash calculation for binaryObject in case of deduplication

     [ https://issues.apache.org/jira/browse/IGNITE-13080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Anton Kalashnikov updated IGNITE-13080:
---------------------------------------
    Description: 
Lets suppose we have two follows classes(Implimentation of SubKey doesn't matter here):
{noformat}
public static class Key {   
        private SubKey subKey;
}

public static class Value {
        private SubKey subKey;
        private Key key;
}
{noformat}
If subKey would be same in Key and Value, and we try to do follows things:
{noformat}
SubKey subKey = new SubKey();
Key key = new Key(subKey);
Value value = new Value(subKey, key);

cache.put(key, value); 

assert cache.size() == 1; //true

BinaryObject keyAsBinaryObject = cache.get(key).field("key");

cache.put(keyAsBinaryObject, value); // cache.size shuld be 1 but it would be 2

assert cache.size() == 1; //false because right now we have to different key which is wrong
{noformat}
We get two different record instead of one.

Reason:
When we put raw class Key to cache ignite convert it to binary object(literally to a byte array), and then calculate the hash over this byte array and store it to this object.

When we put the raw class Value, the same thing happens, but since we have two references to the same object (subKey) inside Value, deduplication occurs. This means that the first time we meet an object, we save it as it is and remember its location, and then if we meet the same object again instead of saving all their bytes as is, we mark this place as HANDLE and record only the offset at which we can find the saved object.
After that, we try to receive some object(Key) from BinaryObject of Value as a result we don't have a new BinaryObject with a new byte array but instead, we have BinaryObject with same byte array and with offset which shows us where we can find the requested value(Key). And when we try to store this object to cache, ignite does it incorrectly - first of all, byte array contains HANDLE mark with offset instead of real bytes of the inner object what is already wrong but more than it we also calculate hash incorrectly.

Problem:
Right now, Ignite isn't able to store BinaryObject with contains HANDLE. And as I understand, it's not so easy to fix. Maybe it makes sense just explicitly forbid to work with BinaryObject such described above but of course, it is discussable.

Workaround:
we can change the order of field in Value, like this:
{noformat}
public static class Value {       
        private Key key;
        private SubKey subKey;
}
{noformat}
After that subKey would be inlined inside of key and subKey inside of Value would be represented as HANDLE.

Also we can rebuild the object such that:
{noformat}
keyAsBinaryObject.toBuilder().build();
{noformat}
During the this procedure all HANDLE would be restored to real objects.

Test which represents this case - org.apache.ignite.internal.processors.cache.IgniteCachePutKeyAttachedBinaryObjectTest#testKeyRefersToSubkeyInValue(After https://issues.apache.org/jira/browse/IGNITE-12911 would be resolved).


  was:
Lets suppose we have two follows classes(Implimentation of SubKey doesn't matter here):
{noformat}
public static class Key {   
        private SubKey subKey;
}

public static class Value {
        private SubKey subKey;
        private Key key;
}
{noformat}
If subKey would be same in Key and Value, and we try to do follows things:
{noformat}
SubKey subKey = new SubKey();
Key key = new Key(subKey);
Value value = new Value(subKey, key);

cache.put(key, value); 

assert cache.size() == 1; //true

BinaryObject keyAsBinaryObject = cache.get(key).field("key");

cache.put(keyAsBinaryObject, value); // cache.size shuld be 1 but it would be 2

assert cache.size() == 1; //false because right now we have to different key which is wrong
{noformat}
We get two different record instead of one.

Reason:
When we put raw class Key to cache ignite convert it to binary object(literally to a byte array), and then calculate the hash over this byte array and store it to this object.

When we put the raw class Value, the same thing happens, but since we have two references to the same object (subKey) inside Value, deduplication occurs. This means that the first time we meet an object, we save it as it is and remember its location, and then if we meet the same object again instead of saving all their bytes as is, we mark this place as HANDLE and record only the offset at which we can find the saved object.
After that, we try to receive some object(Key) from BinaryObject of Value as a result we don't have a new BinaryObject with a new byte array but instead, we have BinaryObject with same byte array and with offset which shows us where we can find the requested value(Key). And when we try to store this object to cache, ignite does it incorrectly - first of all, byte array contains HANDLE mark with offset instead of real bytes of the inner object what is already wrong but more than it we also calculate hash incorrectly.

Problem:
Right now, Ignite isn't able to store BinaryObject with contains HANDLE. And as I understand, it's not so easy to fix. Maybe it makes sense just explicitly forbid to work with BinaryObject such described above but of course, it is discussable.

Workaround:
we can change the order of field in Value, like this:
{noformat}
public static class Value {       
        private Key key;
        private SubKey subKey;
}
{noformat}
After that subKey would be inlined inside of key and subKey inside of Value would be represented as HANDLE.

Also we can rebuild the object such that:
{noformat}
keyAsBinaryObject.toBuilder().build();
{noformat}
During the this procedure all HANDLE would be restored to real objects.



> Incorrect hash calculation for binaryObject in case of deduplication
> --------------------------------------------------------------------
>
>                 Key: IGNITE-13080
>                 URL: https://issues.apache.org/jira/browse/IGNITE-13080
>             Project: Ignite
>          Issue Type: Bug
>          Components: binary
>            Reporter: Anton Kalashnikov
>            Priority: Major
>
> Lets suppose we have two follows classes(Implimentation of SubKey doesn't matter here):
> {noformat}
> public static class Key {   
>         private SubKey subKey;
> }
> public static class Value {
>         private SubKey subKey;
>         private Key key;
> }
> {noformat}
> If subKey would be same in Key and Value, and we try to do follows things:
> {noformat}
> SubKey subKey = new SubKey();
> Key key = new Key(subKey);
> Value value = new Value(subKey, key);
> cache.put(key, value); 
> assert cache.size() == 1; //true
> BinaryObject keyAsBinaryObject = cache.get(key).field("key");
> cache.put(keyAsBinaryObject, value); // cache.size shuld be 1 but it would be 2
> assert cache.size() == 1; //false because right now we have to different key which is wrong
> {noformat}
> We get two different record instead of one.
> Reason:
> When we put raw class Key to cache ignite convert it to binary object(literally to a byte array), and then calculate the hash over this byte array and store it to this object.
> When we put the raw class Value, the same thing happens, but since we have two references to the same object (subKey) inside Value, deduplication occurs. This means that the first time we meet an object, we save it as it is and remember its location, and then if we meet the same object again instead of saving all their bytes as is, we mark this place as HANDLE and record only the offset at which we can find the saved object.
> After that, we try to receive some object(Key) from BinaryObject of Value as a result we don't have a new BinaryObject with a new byte array but instead, we have BinaryObject with same byte array and with offset which shows us where we can find the requested value(Key). And when we try to store this object to cache, ignite does it incorrectly - first of all, byte array contains HANDLE mark with offset instead of real bytes of the inner object what is already wrong but more than it we also calculate hash incorrectly.
> Problem:
> Right now, Ignite isn't able to store BinaryObject with contains HANDLE. And as I understand, it's not so easy to fix. Maybe it makes sense just explicitly forbid to work with BinaryObject such described above but of course, it is discussable.
> Workaround:
> we can change the order of field in Value, like this:
> {noformat}
> public static class Value {       
>         private Key key;
>         private SubKey subKey;
> }
> {noformat}
> After that subKey would be inlined inside of key and subKey inside of Value would be represented as HANDLE.
> Also we can rebuild the object such that:
> {noformat}
> keyAsBinaryObject.toBuilder().build();
> {noformat}
> During the this procedure all HANDLE would be restored to real objects.
> Test which represents this case - org.apache.ignite.internal.processors.cache.IgniteCachePutKeyAttachedBinaryObjectTest#testKeyRefersToSubkeyInValue(After https://issues.apache.org/jira/browse/IGNITE-12911 would be resolved).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)