You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ignite.apache.org by "Vladimir Ozerov (JIRA)" <ji...@apache.org> on 2015/09/25 13:21:04 UTC
[jira] [Created] (IGNITE-1549) Optimize portable object fields write in non-raw mode.

Vladimir Ozerov created IGNITE-1549:
---------------------------------------

             Summary: Optimize portable object fields write in non-raw mode.
                 Key: IGNITE-1549
                 URL: https://issues.apache.org/jira/browse/IGNITE-1549
             Project: Ignite
          Issue Type: Task
          Components: general
    Affects Versions: 1.1.4
            Reporter: Vladimir Ozerov
            Priority: Blocker
             Fix For: ignite-1.5


Currently we write user fields as follows:
0 ,, 3 - field ID;
4 - field type;
5 ..8 - field len;
9 .. - the field itself.

It can be optimized as follows:
1) Field len usually can be inferred from type. E.g., for int it is 4.
2) Frequently used constants can be written as separate types. E.g. INT - normal int, INT_0 - zero, etc.
3) Last, but not least, values should be encoded using "variable bytes" (and possibly ZigZag) algorithm. This will give us 2 bytes economy for ints and longs on average (I assume here that longs are usually bigger than 4 bytes, e.g. timestamps).

*New types will be introduced:*
1) Booleans: BOOL_FALSE, BOOL_TRUE;
2) Bytes: BYTE_C0 => zero, BYTE_C1 => 1, BYTE_C1N => -1;
3) Shorts, chars: SHORT_C0, SHORT_C1, SHORT_C1N;
4) Ints: INT_C0, INT_C1, INT_C1N, INT_1 - int which fits into 1 byte, INT_1N - same for negative value, INT_2, INT_2N, INT_3, INT_3N, INT_3, INT_3N, INT_4, INT_4N.
5) Longs: same as ints, but have only 2, 4, 6 and 8 byte count discriminators to avoid excessive calculations.

It means that instead of 6 integer types previously, we will have 2 + 3 + 3 + 3 + 11 + 11 = 32 types.

To avoid excessive switches or (even worse) array/map lookups to understand what the type is, we can divide all types space (256) into two parts: optimized and non-optimized. Optimized space will have the MSB set to 1, and mentioned ~30 optimized types (or some of them) are located there.

For floats and doubles we simply infer length. 

For primitive arrays we do not write field length and then arrya length, but only array length.

*Expected compaction*:
bool: 10 -> 5 bytes (50%);
byte: 10 -> 5-6 bytes (45%);
short, char: 11 -> 5-7 bytes, 7 on average (35%);
int: 13 -> 5-9 bytes, 7 on average (45%).
long: 17 -> 5-13 bytes, 11 on average (35%).
float: 13 -> 9 bytes (30%);
double: 17 -> 13 bytes (25%);

*Expected CPU overhead on writes:*
Bool, float, double: -
Byte, short, char: zero check, sign check;
Int, long: two (shift + OR)s to understand bytes count, if small - "zero" and "one" checks, if big - sign check,

*Expected CPU overhead on reads:*
One additional branch between optimzied and non-optimized spaces.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)