You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Alexandr Kuramshin (JIRA)" <ji...@apache.org> on 2017/09/26 06:06:00 UTC

[jira] [Created] (IGNITE-6499) Compact NULL fields binary representation

Alexandr Kuramshin created IGNITE-6499:
------------------------------------------

             Summary: Compact NULL fields binary representation
                 Key: IGNITE-6499
                 URL: https://issues.apache.org/jira/browse/IGNITE-6499
             Project: Ignite
          Issue Type: Improvement
          Components: binary
    Affects Versions: 2.1
            Reporter: Alexandr Kuramshin
            Assignee: Vladimir Ozerov


Current compact footer implementation writes offset for the every field in schema. Depending on serialized size of an object offset may be 1, 2 or 4 bytes.

Imagine an object with some 100 fields are null. It takes from 100 to 400 bytes overhead. For middle-sized objects (about 260 bytes) it doubles the memory usage. For a small-sized objects (about 40 bytes) the memory usage increased by factor 3 or 4.

Proposed two optimizations, the both should be implemented, the most optimal implementation should be selected dynamically upon object marshalling.

1. Write field ID and offset for the only non-null fields in footer.

2. Write footer header then field offsets for the only non-null fields as follows

[0] bit mask for first 8 fields, 0 - field is null, 1 - field is non-null
[1] cumulative sum of "1" bits
[2] bit mask for the next 8 fields
[3] cumulative sum of "1" bits
... and so on
[N1...N2] offset of first non-null field
[N3...N4] offset of next non-null field
... and so on

If we want to read fields from 0 to 7, then we read first footer byte, step through bits and find the offset index for non-null field or find that field is null.

If we want to read fields from 8, then we read two footer bytes, take start offset from the first byte, and then step through bits and find the offset index for non-null field or find that field is null.

This supports up to 255 non-null fields per binary object.

Overhead would be only 24 bytes per 100 null fields instead of 200 bytes for the middle-sized object.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)