You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@ignite.apache.org by "Steve Hostettler (Jira)" <ji...@apache.org> on 2020/05/21 20:53:00 UTC

[jira] [Assigned] (IGNITE-6499) Compact NULL fields binary representation

     [ https://issues.apache.org/jira/browse/IGNITE-6499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Steve Hostettler reassigned IGNITE-6499:
----------------------------------------

    Assignee: Steve Hostettler

> Compact NULL fields binary representation
> -----------------------------------------
>
>                 Key: IGNITE-6499
>                 URL: https://issues.apache.org/jira/browse/IGNITE-6499
>             Project: Ignite
>          Issue Type: Improvement
>          Components: binary
>    Affects Versions: 2.1
>            Reporter: Alexandr Kuramshin
>            Assignee: Steve Hostettler
>            Priority: Major
>              Labels: iep-2
>
> Current compact footer implementation writes offset for the every field in schema. Depending on serialized size of an object offset may be 1, 2 or 4 bytes.
> Imagine an object with some 100 fields are null. It takes from 100 to 400 bytes overhead. For middle-sized objects (about 260 bytes) it doubles the memory usage. For a small-sized objects (about 40 bytes) the memory usage increased by factor 3 or 4.
> Proposed two optimizations, the both should be implemented, the most optimal implementation should be selected dynamically upon object marshalling.
> 1. Write field index and offset for the only non-null fields in footer.
> Should be used for objects with a few non-null fields.
> 2. Write footer header then field offsets for the only non-null fields as follows
> [0] bit mask for first 8 fields, 0 - field is null, 1 - field is non-null
> [1] cumulative sum of "1" bits
> [2] bit mask for the next 8 fields
> [3] cumulative sum of "1" bits
> ... and so on
> [N1...N2] offset of first non-null field
> [N3...N4] offset of next non-null field
> ... and so on
> If we want to read fields from 0 to 7, then we read first footer byte, step through bits and find the offset index for non-null field or find that field is null.
> If we want to read fields from 8, then we read two footer bytes, take start offset index from the first byte, and then step through bits and find the offset index for non-null field or find that field is null.
> This supports up to 255 non-null fields per binary object.
> Overhead would be only 24 bytes per 100 null fields instead of 200 bytes for the middle-sized object.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)