You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ignite.apache.org by Evgeniy Rudenko <e....@gmail.com> on 2020/08/19 06:06:40 UTC

Update of the default inline size for variable types

Hi guys,

Currently if a varlength type (such as String or byte[]) is encountered in
the composite index inline size just defaults to 10, which is almost always
not enough. I am going to change this and implement following changes:

1) For a column of the variable length keep using 10 as the default size in
case of the one-column index. But if the index is composite the default
index size will be calculated as the sum of sizes of all indexed columns.
For example, for the index like (INT, VARCHAR, VARCHAR, INT) default inline
size will be 5 + 10 + 10 + 5 = 30 (5 for each int, 10 for each string).

2) For sql varchar and binary columns with defined length (for example
VARCHAR(XX)) use XX + 3 as default inline size for the column (need 3 extra
bytes for the inner representation of the type).

3) Maximum default index size still will be limited by
IGNITE_MAX_INDEX_PAYLOAD_SIZE, but its default value will be increased to
64. For example for the index (VARCHAR, VARCHAR, VARCHAR, VARCHAR, VARCHAR,
VARCHAR, VARCHAR) default index size will be only 64. Same for the columns
with defined length: by default VARCHAR(100) column will create index only
with size equal to 64.

Please tell if you have any concerns. Update can be found at
https://github.com/apache/ignite/pull/8161

Best regards,
Evgeniy

Re[2]: Update of the default inline size for variable types

Posted by Zhenya Stanilovsky <ar...@mail.ru.INVALID>.
Huge +1 with Ilya
I check your pr, this looks like stub :  Pattern . compile( " \\ w+ \\ (( \\ d+) \\ ) " );
*  Do we have some normalization before it ? varchar(whitespace + N) looks like not matching.
*  Can we obtain this info not from regexp ?
>Hello!
>
>I can see where you are getting at but, as far as my experience tells me,
>64 is already too large for the average use case. It will also start to
>drag on the performance since you don't have too many entries in one page
>anymore, and your tree starts to grow up, not to mention more i/o.
>
>I think we should benchmark it, see at which value we see a sharp decline.
>Maybe 64 is OK after all, if it's a maximum for a complex index. Just make
>sure that a single VARCHAR without length is still 10 and not 64.
>
>Regards,
>--
>Ilya Kasnacheev
>
>
>чт, 20 авг. 2020 г. в 11:15, Evgeniy Rudenko < e.a.rudenko@gmail.com >:
> 
>> Hi guys,
>>
>> Thank you for your feedback.
>>
>> Current calculation of the default size is not completely correct. If it
>> meets a field of the variable length (such as byte array or string) it just
>> stops any attempt to make index size more reasonable and uses
>> IGNITE_MAX_INDEX_PAYLOAD_SIZE_DEFAULT as its size. Such approach doesn't
>> seem correct to me in any case. First part of the update changes this logic
>> and starts to calculate size based on all indexed columns. This update can
>> even save some space for the users with varchars and high
>> IGNITE_MAX_INDEX_PAYLOAD_SIZE_DEFAULT value.
>>
>> Second part of the update increases IGNITE_MAX_INDEX_PAYLOAD_SIZE_DEFAULT.
>> Please note that we are changing only upper bound of the default size.
>> Obviously this can lead to some increase of the used space, but we are
>> trading size for the speed here. Current default value is too small for the
>> average usage case. Users which care about size of the data still can set
>> exact size of each index or limit all sizes by
>> IGNITE_MAX_INDEX_PAYLOAD_SIZE_DEFAULT. So after the update users which
>> would want to keep previous data size will just need to set
>> IGNITE_MAX_INDEX_PAYLOAD_SIZE_DEFAULT=10.
>>
>>
>>
>> On Wed, Aug 19, 2020 at 5:20 PM Vladislav Pyatkov < vldpyatkov@gmail.com >
>> wrote:
>>
>> > Hi,
>> >
>> > In my mind, the inline size 64 will be able to significant grow of
>> storage
>> > size.
>> > It can be difficult to understand by users.
>> >
>> > Earlier I remember we panned to replace inline value to hash code in the
>> > case where size of value more than inline size.
>> > It will help to comparison of "==", "!=", but will not grow size of
>> > storage.
>> >
>> > I think optimization with hash code looks more preferable and in last way
>> > anyone can to grow size of baseline though API.
>> >
>> >
>> > On Wed, Aug 19, 2020 at 9:22 AM Zhenya Stanilovsky
>> > < arzamas123@mail.ru.invalid > wrote:
>> >
>> > >
>> > >
>> > > >Hi guys,
>> > >
>> > > Evgeniy, hola!
>> > > >
>> > > >Currently if a varlength type (such as String or byte[]) is
>> encountered
>> > in
>> > > >the composite index inline size just defaults to 10, which is almost
>> > > always
>> > > >not enough. I am going to change this and implement following changes:
>> > > >
>> > > >1) For a column of the variable length keep using 10 as the default
>> size
>> > > in
>> > > >case of the one-column index. But if the index is composite the
>> default
>> > > >index size will be calculated as the sum of sizes of all indexed
>> > columns.
>> > > >For example, for the index like (INT, VARCHAR, VARCHAR, INT) default
>> > > inline
>> > > >size will be 5 + 10 + 10 + 5 = 30 (5 for each int, 10 for each
>> string).
>> > >
>> > > Why exactly this approach ? Why not 5 + 10 and its all here ? Do you
>> have
>> > > some logical base, statistical distribution or something near it, for
>> now
>> > > this look as your own decision and nothing more, i`m wrong ?
>> > > >
>> > > >2) For sql varchar and binary columns with defined length (for example
>> > > >VARCHAR(XX)) use XX + 3 as default inline size for the column (need 3
>> > > extra
>> > > >bytes for the inner representation of the type).
>> > >
>> > > The same question here, why you want o cover all varchar len ? do you
>> > > compare with other vendors approach ?
>> > > >
>> > > >3) Maximum default index size still will be limited by
>> > > >IGNITE_MAX_INDEX_PAYLOAD_SIZE, but its default value will be increased
>> > to
>> > > >64. For example for the index (VARCHAR, VARCHAR, VARCHAR, VARCHAR,
>> > > VARCHAR,
>> > > >VARCHAR, VARCHAR) default index size will be only 64. Same for the
>> > columns
>> > > >with defined length: by default VARCHAR(100) column will create index
>> > only
>> > > >with size equal to 64.
>> > > >
>> > > >Please tell if you have any concerns. Update can be found at
>> > > > https://github.com/apache/ignite/pull/8161
>> > > >
>> > > >Best regards,
>> > > >Evgeniy
>> > > >
>> > >
>> > >
>> > >
>> > >
>> >
>> >
>> >
>> > --
>> > Vladislav Pyatkov
>> >
>>
>>
>> --
>> Best regards,
>> Evgeniy
>> 
 
 
 
 

Re: Update of the default inline size for variable types

Posted by Ilya Kasnacheev <il...@gmail.com>.
Hello!

I can see where you are getting at but, as far as my experience tells me,
64 is already too large for the average use case. It will also start to
drag on the performance since you don't have too many entries in one page
anymore, and your tree starts to grow up, not to mention more i/o.

I think we should benchmark it, see at which value we see a sharp decline.
Maybe 64 is OK after all, if it's a maximum for a complex index. Just make
sure that a single VARCHAR without length is still 10 and not 64.

Regards,
-- 
Ilya Kasnacheev


чт, 20 авг. 2020 г. в 11:15, Evgeniy Rudenko <e....@gmail.com>:

> Hi guys,
>
> Thank you for your feedback.
>
> Current calculation of the default size is not completely correct. If it
> meets a field of the variable length (such as byte array or string) it just
> stops any attempt to make index size more reasonable and uses
> IGNITE_MAX_INDEX_PAYLOAD_SIZE_DEFAULT as its size. Such approach doesn't
> seem correct to me in any case. First part of the update changes this logic
> and starts to calculate size based on all indexed columns. This update can
> even save some space for the users with varchars and high
> IGNITE_MAX_INDEX_PAYLOAD_SIZE_DEFAULT value.
>
> Second part of the update increases IGNITE_MAX_INDEX_PAYLOAD_SIZE_DEFAULT.
> Please note that we are changing only upper bound of the default size.
> Obviously this can lead to some increase of the used space, but we are
> trading size for the speed here. Current default value is too small for the
> average usage case. Users which care about size of the data still can set
> exact size of each index or limit all sizes by
> IGNITE_MAX_INDEX_PAYLOAD_SIZE_DEFAULT. So after the update users which
> would want to keep previous data size will just need to set
> IGNITE_MAX_INDEX_PAYLOAD_SIZE_DEFAULT=10.
>
>
>
> On Wed, Aug 19, 2020 at 5:20 PM Vladislav Pyatkov <vl...@gmail.com>
> wrote:
>
> > Hi,
> >
> > In my mind, the inline size 64 will be able to significant grow of
> storage
> > size.
> > It can be difficult to understand by users.
> >
> > Earlier I remember we panned to replace inline value to hash code in the
> > case where size of value more than inline size.
> > It will help to comparison of "==", "!=", but will not grow size of
> > storage.
> >
> > I think optimization with hash code looks more preferable and in last way
> > anyone can to grow size of baseline though API.
> >
> >
> > On Wed, Aug 19, 2020 at 9:22 AM Zhenya Stanilovsky
> > <ar...@mail.ru.invalid> wrote:
> >
> > >
> > >
> > > >Hi guys,
> > >
> > > Evgeniy, hola!
> > > >
> > > >Currently if a varlength type (such as String or byte[]) is
> encountered
> > in
> > > >the composite index inline size just defaults to 10, which is almost
> > > always
> > > >not enough. I am going to change this and implement following changes:
> > > >
> > > >1) For a column of the variable length keep using 10 as the default
> size
> > > in
> > > >case of the one-column index. But if the index is composite the
> default
> > > >index size will be calculated as the sum of sizes of all indexed
> > columns.
> > > >For example, for the index like (INT, VARCHAR, VARCHAR, INT) default
> > > inline
> > > >size will be 5 + 10 + 10 + 5 = 30 (5 for each int, 10 for each
> string).
> > >
> > > Why exactly this approach ? Why not 5 + 10 and its all here ? Do you
> have
> > > some logical base, statistical distribution or something near it, for
> now
> > > this look as your own decision and nothing more, i`m wrong ?
> > > >
> > > >2) For sql varchar and binary columns with defined length (for example
> > > >VARCHAR(XX)) use XX + 3 as default inline size for the column (need 3
> > > extra
> > > >bytes for the inner representation of the type).
> > >
> > > The same question here, why you want o cover all varchar len ? do you
> > > compare with other vendors approach ?
> > > >
> > > >3) Maximum default index size still will be limited by
> > > >IGNITE_MAX_INDEX_PAYLOAD_SIZE, but its default value will be increased
> > to
> > > >64. For example for the index (VARCHAR, VARCHAR, VARCHAR, VARCHAR,
> > > VARCHAR,
> > > >VARCHAR, VARCHAR) default index size will be only 64. Same for the
> > columns
> > > >with defined length: by default VARCHAR(100) column will create index
> > only
> > > >with size equal to 64.
> > > >
> > > >Please tell if you have any concerns. Update can be found at
> > > >https://github.com/apache/ignite/pull/8161
> > > >
> > > >Best regards,
> > > >Evgeniy
> > > >
> > >
> > >
> > >
> > >
> >
> >
> >
> > --
> > Vladislav Pyatkov
> >
>
>
> --
> Best regards,
> Evgeniy
>

Re: Update of the default inline size for variable types

Posted by Evgeniy Rudenko <e....@gmail.com>.
Hi guys,

Thank you for your feedback.

Current calculation of the default size is not completely correct. If it
meets a field of the variable length (such as byte array or string) it just
stops any attempt to make index size more reasonable and uses
IGNITE_MAX_INDEX_PAYLOAD_SIZE_DEFAULT as its size. Such approach doesn't
seem correct to me in any case. First part of the update changes this logic
and starts to calculate size based on all indexed columns. This update can
even save some space for the users with varchars and high
IGNITE_MAX_INDEX_PAYLOAD_SIZE_DEFAULT value.

Second part of the update increases IGNITE_MAX_INDEX_PAYLOAD_SIZE_DEFAULT.
Please note that we are changing only upper bound of the default size.
Obviously this can lead to some increase of the used space, but we are
trading size for the speed here. Current default value is too small for the
average usage case. Users which care about size of the data still can set
exact size of each index or limit all sizes by
IGNITE_MAX_INDEX_PAYLOAD_SIZE_DEFAULT. So after the update users which
would want to keep previous data size will just need to set
IGNITE_MAX_INDEX_PAYLOAD_SIZE_DEFAULT=10.



On Wed, Aug 19, 2020 at 5:20 PM Vladislav Pyatkov <vl...@gmail.com>
wrote:

> Hi,
>
> In my mind, the inline size 64 will be able to significant grow of storage
> size.
> It can be difficult to understand by users.
>
> Earlier I remember we panned to replace inline value to hash code in the
> case where size of value more than inline size.
> It will help to comparison of "==", "!=", but will not grow size of
> storage.
>
> I think optimization with hash code looks more preferable and in last way
> anyone can to grow size of baseline though API.
>
>
> On Wed, Aug 19, 2020 at 9:22 AM Zhenya Stanilovsky
> <ar...@mail.ru.invalid> wrote:
>
> >
> >
> > >Hi guys,
> >
> > Evgeniy, hola!
> > >
> > >Currently if a varlength type (such as String or byte[]) is encountered
> in
> > >the composite index inline size just defaults to 10, which is almost
> > always
> > >not enough. I am going to change this and implement following changes:
> > >
> > >1) For a column of the variable length keep using 10 as the default size
> > in
> > >case of the one-column index. But if the index is composite the default
> > >index size will be calculated as the sum of sizes of all indexed
> columns.
> > >For example, for the index like (INT, VARCHAR, VARCHAR, INT) default
> > inline
> > >size will be 5 + 10 + 10 + 5 = 30 (5 for each int, 10 for each string).
> >
> > Why exactly this approach ? Why not 5 + 10 and its all here ? Do you have
> > some logical base, statistical distribution or something near it, for now
> > this look as your own decision and nothing more, i`m wrong ?
> > >
> > >2) For sql varchar and binary columns with defined length (for example
> > >VARCHAR(XX)) use XX + 3 as default inline size for the column (need 3
> > extra
> > >bytes for the inner representation of the type).
> >
> > The same question here, why you want o cover all varchar len ? do you
> > compare with other vendors approach ?
> > >
> > >3) Maximum default index size still will be limited by
> > >IGNITE_MAX_INDEX_PAYLOAD_SIZE, but its default value will be increased
> to
> > >64. For example for the index (VARCHAR, VARCHAR, VARCHAR, VARCHAR,
> > VARCHAR,
> > >VARCHAR, VARCHAR) default index size will be only 64. Same for the
> columns
> > >with defined length: by default VARCHAR(100) column will create index
> only
> > >with size equal to 64.
> > >
> > >Please tell if you have any concerns. Update can be found at
> > >https://github.com/apache/ignite/pull/8161
> > >
> > >Best regards,
> > >Evgeniy
> > >
> >
> >
> >
> >
>
>
>
> --
> Vladislav Pyatkov
>


-- 
Best regards,
Evgeniy

Re: Update of the default inline size for variable types

Posted by Vladislav Pyatkov <vl...@gmail.com>.
Hi,

In my mind, the inline size 64 will be able to significant grow of storage
size.
It can be difficult to understand by users.

Earlier I remember we panned to replace inline value to hash code in the
case where size of value more than inline size.
It will help to comparison of "==", "!=", but will not grow size of storage.

I think optimization with hash code looks more preferable and in last way
anyone can to grow size of baseline though API.


On Wed, Aug 19, 2020 at 9:22 AM Zhenya Stanilovsky
<ar...@mail.ru.invalid> wrote:

>
>
> >Hi guys,
>
> Evgeniy, hola!
> >
> >Currently if a varlength type (such as String or byte[]) is encountered in
> >the composite index inline size just defaults to 10, which is almost
> always
> >not enough. I am going to change this and implement following changes:
> >
> >1) For a column of the variable length keep using 10 as the default size
> in
> >case of the one-column index. But if the index is composite the default
> >index size will be calculated as the sum of sizes of all indexed columns.
> >For example, for the index like (INT, VARCHAR, VARCHAR, INT) default
> inline
> >size will be 5 + 10 + 10 + 5 = 30 (5 for each int, 10 for each string).
>
> Why exactly this approach ? Why not 5 + 10 and its all here ? Do you have
> some logical base, statistical distribution or something near it, for now
> this look as your own decision and nothing more, i`m wrong ?
> >
> >2) For sql varchar and binary columns with defined length (for example
> >VARCHAR(XX)) use XX + 3 as default inline size for the column (need 3
> extra
> >bytes for the inner representation of the type).
>
> The same question here, why you want o cover all varchar len ? do you
> compare with other vendors approach ?
> >
> >3) Maximum default index size still will be limited by
> >IGNITE_MAX_INDEX_PAYLOAD_SIZE, but its default value will be increased to
> >64. For example for the index (VARCHAR, VARCHAR, VARCHAR, VARCHAR,
> VARCHAR,
> >VARCHAR, VARCHAR) default index size will be only 64. Same for the columns
> >with defined length: by default VARCHAR(100) column will create index only
> >with size equal to 64.
> >
> >Please tell if you have any concerns. Update can be found at
> >https://github.com/apache/ignite/pull/8161
> >
> >Best regards,
> >Evgeniy
> >
>
>
>
>



-- 
Vladislav Pyatkov

Re: Update of the default inline size for variable types

Posted by Zhenya Stanilovsky <ar...@mail.ru.INVALID>.

>Hi guys,
 
Evgeniy, hola!
>
>Currently if a varlength type (such as String or byte[]) is encountered in
>the composite index inline size just defaults to 10, which is almost always
>not enough. I am going to change this and implement following changes:
>
>1) For a column of the variable length keep using 10 as the default size in
>case of the one-column index. But if the index is composite the default
>index size will be calculated as the sum of sizes of all indexed columns.
>For example, for the index like (INT, VARCHAR, VARCHAR, INT) default inline
>size will be 5 + 10 + 10 + 5 = 30 (5 for each int, 10 for each string).
 
Why exactly this approach ? Why not 5 + 10 and its all here ? Do you have some logical base, statistical distribution or something near it, for now this look as your own decision and nothing more, i`m wrong ?
>
>2) For sql varchar and binary columns with defined length (for example
>VARCHAR(XX)) use XX + 3 as default inline size for the column (need 3 extra
>bytes for the inner representation of the type).
 
The same question here, why you want o cover all varchar len ? do you compare with other vendors approach ?
>
>3) Maximum default index size still will be limited by
>IGNITE_MAX_INDEX_PAYLOAD_SIZE, but its default value will be increased to
>64. For example for the index (VARCHAR, VARCHAR, VARCHAR, VARCHAR, VARCHAR,
>VARCHAR, VARCHAR) default index size will be only 64. Same for the columns
>with defined length: by default VARCHAR(100) column will create index only
>with size equal to 64.
>
>Please tell if you have any concerns. Update can be found at
>https://github.com/apache/ignite/pull/8161
>
>Best regards,
>Evgeniy
>