You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by Luke Burton <lu...@icloud.com> on 2017/03/15 17:34:24 UTC

Problems storing RoaringBitmaps

Hi there,

I'm storing RoaringBitmaps in Ignite and have encountered an odd serialization issue. Please forgive the samples below being in Clojure, I've written a small wrapper around most Ignite APIs that I can use. I think you can catch the gist of what I'm doing, I hope :)

 (let [inst (i/instance "attribute_bitmaps")
        bmp (doto (RoaringBitmap.)
              (.add 9999999))
        srz (fn [it]
              (let [b (ByteArrayOutputStream.)]
                (with-open [o (DataOutputStream. b)]
                  (.serialize it o))
                (.toByteArray b)))]

    (doto inst
      (i/clear)
      (i/put "test" bmp))

    (byte-streams/print-bytes (srz bmp))
    (byte-streams/print-bytes (srz (i/get inst "test"))))

Here I'm just creating a bitmap and storing it in a cache. I'm then just printing the bytes of the thing I stored, as well as what it looks like coming back out of Ignite.

I get the following warning in the logs:

10:16:45.730 [djura.local ~ nREPL-worker-3 ~ o.a.i.internal.binary.BinaryContext] Class "org.roaringbitmap.RoaringBitmap" cannot be serialized using BinaryMarshaller because it either implements Externalizable interface or have writeObject/readObject methods. OptimizedMarshaller will be used instead and class instances will be deserialized on the server. Please ensure that all nodes have this class in classpath. To enable binary serialization either implement Binarylizable interface or set explicit serializer using BinaryTypeConfiguration.setSerializer() method. 

And really strangely, I get the same number of bytes back but some of them at the end have been zero'd out (first one is correct, second one went through Ignite):

3A 30 00 00 01 00 00 00  98 00 00 00 10 00 00 00      .0..............
7F 96                                                  ..
3A 30 00 00 01 00 00 00  98 00 00 00 10 00 00 00      .0..............
00 00                                                  ..

The resulting object is still a valid RoaringBitmap, except all the values in the bitmap are wrong! 

I'm guessing from this and the logs, that OptimizedMarshaller is being used instead of BinaryMarshaller, and OptimizedMarshaller is not copying the internal fields of the class correctly.

Would the recommended approach here be to create a custom class that extends RoaringBitmap and implements Binarylizable? I'm not sure if Binarylizable is a suitable approach for this situation where I don't control the source for the class in question. I have no knowledge of the internal fields of this class and really just want to ensure it survives the roundtrip through Ignite by using its own internal serialization mechanism.

Luke.


Re: Problems storing RoaringBitmaps

Posted by Denis Magda <dm...@apache.org>.
Hi Luke,

Is there any chance you can create a similar test using Java so that we can run it on our side?

In the meanwhile, the warning below just says that your object can’t be deserialized into Ignite’s binary form:

>>> 10:16:45.730 [djura.local ~ nREPL-worker-3 ~ o.a.i.internal.binary.BinaryContext] Class "org.roaringbitmap.RoaringBitmap" cannot be serialized using BinaryMarshaller because it either implements Externalizable interface or have writeObject/readObject methods. OptimizedMarshaller will be used instead and class instances will be deserialized on the server. Please ensure that all nodes have this class in classpath. To enable binary serialization either implement Binarylizable interface or set explicit serializer using BinaryTypeConfiguration.setSerializer() method. 

Usually this happens when an object is of Externalizable class or overrides writeObject/readObject methods. Refer to “restrictions” callout from this page:
https://apacheignite.readme.io/docs/binary-marshaller#section-basic-concepts <https://apacheignite.readme.io/docs/binary-marshaller#section-basic-concepts>

In general, it’s not severe if you see this warning. It will just mean that you need to have classes across your cluster nodes if the object might be serialized on the servers side (SQL queries execution for instance).

> Well, this introduces other issues, as Clojure's immutable data structures rely on different semantics for hashCode. So now I get duplicate keys when I use BinaryMarshaller.


Why do you use this object as a key? Just in case take a look here:
https://apacheignite.readme.io/docs/binary-marshaller#handling-hash-code-generation-and-equals-execution <https://apacheignite.readme.io/docs/binary-marshaller#handling-hash-code-generation-and-equals-execution>

—
Denis

> On Mar 15, 2017, at 5:19 PM, Luke Burton <lu...@icloud.com> wrote:
> 
> 
> Well, this introduces other issues, as Clojure's immutable data structures rely on different semantics for hashCode. So now I get duplicate keys when I use BinaryMarshaller.
> 
> Switching back to OptimizedMarshaller, this duplicate key problem goes away (I'll deal with having to do this another time). But now there is a new problem: wrong byte order, presumably due to RoaringBitmap's endian order: https://github.com/RoaringBitmap/RoaringBitmap/issues/47 <https://github.com/RoaringBitmap/RoaringBitmap/issues/47>
> 
> You can see it clearly here in the last 32 bits:
> 
> 3A 30 00 00 02 00 00 00  01 00 00 00 98 00 00 00      .0..............
> 18 00 00 00 1A 00 00 00  9F 86 7F 96                  ............
> 3A 30 00 00 02 00 00 00  01 00 00 00 98 00 00 00      .0..............
> 18 00 00 00 1A 00 00 00  86 9F 96 7F                  ............
> 
> Frustrating! Is there a way to customize the serializer for OptimizedMarshaller, or otherwise stop it from twiddling these bits?
> 
> 
>> On Mar 15, 2017, at 11:24 AM, Luke Burton <luke_burton@icloud.com <ma...@icloud.com>> wrote:
>> 
>> 
>> I did just manage to use BinarySerializer and registered it as a custom serializer, it seems to work pending some more tests. Again, curious to hear if this approach is appropriate for the use case. Here's the Clojure version:
>> 
>> (defn roaring-serializer []
>>  (reify BinarySerializer
>>    (writeBinary [this o binaryWriter]
>>      ; o contains our object.
>>      (let [byte-stream (ByteArrayOutputStream.)]
>>        (with-open [out-stream (DataOutputStream. byte-stream)]
>>          (.serialize ^RoaringBitmap o out-stream))
>>        (.writeByteArray binaryWriter "val" (.toByteArray byte-stream))))
>> 
>>    (readBinary [this o binaryReader]
>>      ; o contains an empty object
>>      (let [raw-bytes (.readByteArray binaryReader "val")
>>            byte-stream (ByteArrayInputStream. raw-bytes)]
>>        (with-open [in-stream (DataInputStream. byte-stream)]
>>          (.deserialize ^RoaringBitmap o in-stream))))))
>> 
>> 
>> 
>> 
>>> On Mar 15, 2017, at 10:34 AM, Luke Burton <luke_burton@icloud.com <ma...@icloud.com>> wrote:
>>> 
>>> 
>>> Hi there,
>>> 
>>> I'm storing RoaringBitmaps in Ignite and have encountered an odd serialization issue. Please forgive the samples below being in Clojure, I've written a small wrapper around most Ignite APIs that I can use. I think you can catch the gist of what I'm doing, I hope :)
>>> 
>>> (let [inst (i/instance "attribute_bitmaps")
>>>       bmp (doto (RoaringBitmap.)
>>>             (.add 9999999))
>>>       srz (fn [it]
>>>             (let [b (ByteArrayOutputStream.)]
>>>               (with-open [o (DataOutputStream. b)]
>>>                 (.serialize it o))
>>>               (.toByteArray b)))]
>>> 
>>>   (doto inst
>>>     (i/clear)
>>>     (i/put "test" bmp))
>>> 
>>>   (byte-streams/print-bytes (srz bmp))
>>>   (byte-streams/print-bytes (srz (i/get inst "test"))))
>>> 
>>> Here I'm just creating a bitmap and storing it in a cache. I'm then just printing the bytes of the thing I stored, as well as what it looks like coming back out of Ignite.
>>> 
>>> I get the following warning in the logs:
>>> 
>>> 10:16:45.730 [djura.local ~ nREPL-worker-3 ~ o.a.i.internal.binary.BinaryContext] Class "org.roaringbitmap.RoaringBitmap" cannot be serialized using BinaryMarshaller because it either implements Externalizable interface or have writeObject/readObject methods. OptimizedMarshaller will be used instead and class instances will be deserialized on the server. Please ensure that all nodes have this class in classpath. To enable binary serialization either implement Binarylizable interface or set explicit serializer using BinaryTypeConfiguration.setSerializer() method. 
>>> 
>>> And really strangely, I get the same number of bytes back but some of them at the end have been zero'd out (first one is correct, second one went through Ignite):
>>> 
>>> 3A 30 00 00 01 00 00 00  98 00 00 00 10 00 00 00      .0..............
>>> 7F 96                                                  ..
>>> 3A 30 00 00 01 00 00 00  98 00 00 00 10 00 00 00      .0..............
>>> 00 00                                                  ..
>>> 
>>> The resulting object is still a valid RoaringBitmap, except all the values in the bitmap are wrong! 
>>> 
>>> I'm guessing from this and the logs, that OptimizedMarshaller is being used instead of BinaryMarshaller, and OptimizedMarshaller is not copying the internal fields of the class correctly.
>>> 
>>> Would the recommended approach here be to create a custom class that extends RoaringBitmap and implements Binarylizable? I'm not sure if Binarylizable is a suitable approach for this situation where I don't control the source for the class in question. I have no knowledge of the internal fields of this class and really just want to ensure it survives the roundtrip through Ignite by using its own internal serialization mechanism.
>>> 
>>> Luke.
>>> 
>> 
> 


Re: Problems storing RoaringBitmaps

Posted by Luke Burton <lu...@icloud.com>.
Well, this introduces other issues, as Clojure's immutable data structures rely on different semantics for hashCode. So now I get duplicate keys when I use BinaryMarshaller.

Switching back to OptimizedMarshaller, this duplicate key problem goes away (I'll deal with having to do this another time). But now there is a new problem: wrong byte order, presumably due to RoaringBitmap's endian order: https://github.com/RoaringBitmap/RoaringBitmap/issues/47 <https://github.com/RoaringBitmap/RoaringBitmap/issues/47>

You can see it clearly here in the last 32 bits:

3A 30 00 00 02 00 00 00  01 00 00 00 98 00 00 00      .0..............
18 00 00 00 1A 00 00 00  9F 86 7F 96                  ............
3A 30 00 00 02 00 00 00  01 00 00 00 98 00 00 00      .0..............
18 00 00 00 1A 00 00 00  86 9F 96 7F                  ............

Frustrating! Is there a way to customize the serializer for OptimizedMarshaller, or otherwise stop it from twiddling these bits?


> On Mar 15, 2017, at 11:24 AM, Luke Burton <lu...@icloud.com> wrote:
> 
> 
> I did just manage to use BinarySerializer and registered it as a custom serializer, it seems to work pending some more tests. Again, curious to hear if this approach is appropriate for the use case. Here's the Clojure version:
> 
> (defn roaring-serializer []
>  (reify BinarySerializer
>    (writeBinary [this o binaryWriter]
>      ; o contains our object.
>      (let [byte-stream (ByteArrayOutputStream.)]
>        (with-open [out-stream (DataOutputStream. byte-stream)]
>          (.serialize ^RoaringBitmap o out-stream))
>        (.writeByteArray binaryWriter "val" (.toByteArray byte-stream))))
> 
>    (readBinary [this o binaryReader]
>      ; o contains an empty object
>      (let [raw-bytes (.readByteArray binaryReader "val")
>            byte-stream (ByteArrayInputStream. raw-bytes)]
>        (with-open [in-stream (DataInputStream. byte-stream)]
>          (.deserialize ^RoaringBitmap o in-stream))))))
> 
> 
> 
> 
>> On Mar 15, 2017, at 10:34 AM, Luke Burton <lu...@icloud.com> wrote:
>> 
>> 
>> Hi there,
>> 
>> I'm storing RoaringBitmaps in Ignite and have encountered an odd serialization issue. Please forgive the samples below being in Clojure, I've written a small wrapper around most Ignite APIs that I can use. I think you can catch the gist of what I'm doing, I hope :)
>> 
>> (let [inst (i/instance "attribute_bitmaps")
>>       bmp (doto (RoaringBitmap.)
>>             (.add 9999999))
>>       srz (fn [it]
>>             (let [b (ByteArrayOutputStream.)]
>>               (with-open [o (DataOutputStream. b)]
>>                 (.serialize it o))
>>               (.toByteArray b)))]
>> 
>>   (doto inst
>>     (i/clear)
>>     (i/put "test" bmp))
>> 
>>   (byte-streams/print-bytes (srz bmp))
>>   (byte-streams/print-bytes (srz (i/get inst "test"))))
>> 
>> Here I'm just creating a bitmap and storing it in a cache. I'm then just printing the bytes of the thing I stored, as well as what it looks like coming back out of Ignite.
>> 
>> I get the following warning in the logs:
>> 
>> 10:16:45.730 [djura.local ~ nREPL-worker-3 ~ o.a.i.internal.binary.BinaryContext] Class "org.roaringbitmap.RoaringBitmap" cannot be serialized using BinaryMarshaller because it either implements Externalizable interface or have writeObject/readObject methods. OptimizedMarshaller will be used instead and class instances will be deserialized on the server. Please ensure that all nodes have this class in classpath. To enable binary serialization either implement Binarylizable interface or set explicit serializer using BinaryTypeConfiguration.setSerializer() method. 
>> 
>> And really strangely, I get the same number of bytes back but some of them at the end have been zero'd out (first one is correct, second one went through Ignite):
>> 
>> 3A 30 00 00 01 00 00 00  98 00 00 00 10 00 00 00      .0..............
>> 7F 96                                                  ..
>> 3A 30 00 00 01 00 00 00  98 00 00 00 10 00 00 00      .0..............
>> 00 00                                                  ..
>> 
>> The resulting object is still a valid RoaringBitmap, except all the values in the bitmap are wrong! 
>> 
>> I'm guessing from this and the logs, that OptimizedMarshaller is being used instead of BinaryMarshaller, and OptimizedMarshaller is not copying the internal fields of the class correctly.
>> 
>> Would the recommended approach here be to create a custom class that extends RoaringBitmap and implements Binarylizable? I'm not sure if Binarylizable is a suitable approach for this situation where I don't control the source for the class in question. I have no knowledge of the internal fields of this class and really just want to ensure it survives the roundtrip through Ignite by using its own internal serialization mechanism.
>> 
>> Luke.
>> 
> 


Re: Problems storing RoaringBitmaps

Posted by Luke Burton <lu...@icloud.com>.
I did just manage to use BinarySerializer and registered it as a custom serializer, it seems to work pending some more tests. Again, curious to hear if this approach is appropriate for the use case. Here's the Clojure version:

(defn roaring-serializer []
  (reify BinarySerializer
    (writeBinary [this o binaryWriter]
      ; o contains our object.
      (let [byte-stream (ByteArrayOutputStream.)]
        (with-open [out-stream (DataOutputStream. byte-stream)]
          (.serialize ^RoaringBitmap o out-stream))
        (.writeByteArray binaryWriter "val" (.toByteArray byte-stream))))

    (readBinary [this o binaryReader]
      ; o contains an empty object
      (let [raw-bytes (.readByteArray binaryReader "val")
            byte-stream (ByteArrayInputStream. raw-bytes)]
        (with-open [in-stream (DataInputStream. byte-stream)]
          (.deserialize ^RoaringBitmap o in-stream))))))




> On Mar 15, 2017, at 10:34 AM, Luke Burton <lu...@icloud.com> wrote:
> 
> 
> Hi there,
> 
> I'm storing RoaringBitmaps in Ignite and have encountered an odd serialization issue. Please forgive the samples below being in Clojure, I've written a small wrapper around most Ignite APIs that I can use. I think you can catch the gist of what I'm doing, I hope :)
> 
> (let [inst (i/instance "attribute_bitmaps")
>        bmp (doto (RoaringBitmap.)
>              (.add 9999999))
>        srz (fn [it]
>              (let [b (ByteArrayOutputStream.)]
>                (with-open [o (DataOutputStream. b)]
>                  (.serialize it o))
>                (.toByteArray b)))]
> 
>    (doto inst
>      (i/clear)
>      (i/put "test" bmp))
> 
>    (byte-streams/print-bytes (srz bmp))
>    (byte-streams/print-bytes (srz (i/get inst "test"))))
> 
> Here I'm just creating a bitmap and storing it in a cache. I'm then just printing the bytes of the thing I stored, as well as what it looks like coming back out of Ignite.
> 
> I get the following warning in the logs:
> 
> 10:16:45.730 [djura.local ~ nREPL-worker-3 ~ o.a.i.internal.binary.BinaryContext] Class "org.roaringbitmap.RoaringBitmap" cannot be serialized using BinaryMarshaller because it either implements Externalizable interface or have writeObject/readObject methods. OptimizedMarshaller will be used instead and class instances will be deserialized on the server. Please ensure that all nodes have this class in classpath. To enable binary serialization either implement Binarylizable interface or set explicit serializer using BinaryTypeConfiguration.setSerializer() method. 
> 
> And really strangely, I get the same number of bytes back but some of them at the end have been zero'd out (first one is correct, second one went through Ignite):
> 
> 3A 30 00 00 01 00 00 00  98 00 00 00 10 00 00 00      .0..............
> 7F 96                                                  ..
> 3A 30 00 00 01 00 00 00  98 00 00 00 10 00 00 00      .0..............
> 00 00                                                  ..
> 
> The resulting object is still a valid RoaringBitmap, except all the values in the bitmap are wrong! 
> 
> I'm guessing from this and the logs, that OptimizedMarshaller is being used instead of BinaryMarshaller, and OptimizedMarshaller is not copying the internal fields of the class correctly.
> 
> Would the recommended approach here be to create a custom class that extends RoaringBitmap and implements Binarylizable? I'm not sure if Binarylizable is a suitable approach for this situation where I don't control the source for the class in question. I have no knowledge of the internal fields of this class and really just want to ensure it survives the roundtrip through Ignite by using its own internal serialization mechanism.
> 
> Luke.
>