You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@ignite.apache.org by Chrystophe Vergnaud <ch...@gmail.com> on 2021/12/16 12:27:27 UTC

using multiple columns Index

Hello,

I'm running an ignite 2.10 and I don't understand the behavior of the
multi-columns index.

For instance, I have a table t(id, a,b,c, d, e, f, g)
- id is a uuid and is the key
- a is a TIMESTAMP
- b is a  SMALLINT
- c is a TINYINT
- e, f, g are VARCHAR

this table have around 200M lines

I have to select data based on a, b, c in this order, so basically, I have
setup an index on (a,b,c)

If I apply a select with a WHERE clause on "a>=x AND a < y", it works
perfectly, the response time is ok (using USE INDEX)

If I add the b in the WHERE clause I expect to optimize the response time
but it is not the case. Worst, if the value of b is not present in the
slice, it is responding as if the b was not in the WHERE clause at all (it
seems to run a full scan on the sub-result)

Do I miss something ? is it related to the implementation of the B+tree ?

Thanks in advance for your help.

Best regards,

Chrystophe Vergnaud
Architect @ Cyblex Technologies

Re: using multiple columns Index

Posted by Maksim Timonin <ti...@apache.org>.

Hi, Chrystophe!

> But do you have a tweak in mind to get better response time on this kind
of request ?

Do you mean queries that return an empty result set? Unfortunately, I'm not
aware of ways of improving performance of such queries.

Also, I don't know the task you're actually solving. Maybe it's possible to
avoid such queries by replacing it with some different queries or logic, or
data schema, if latency is critical.


On Thu, Dec 16, 2021 at 5:26 PM Chrystophe Vergnaud <
chrystophe.vergnaud@gmail.com> wrote:

> Hi Maxim,
>
> Thanks for the details.
>
> But do you have a tweak in mind to get better response time on this kind
> of request ?
>
> BR,
>
> Chrystophe Vergnaud
> Architect @ Cyblex Technologies
>
>
>
> Le jeu. 16 déc. 2021 à 15:09, Maksim Timonin <ti...@apache.org> a
> écrit :
>
>> Hi, Chrystophe!
>>
>> Multifield index should perfectly work for cases with strict equality
>> like: (a == ? && b == ?) OR (a == ? && b > ?) . But for queries with range
>> queries for first field "a" ("a" > ? && b ...) you should not expect a
>> boost of performance. You're right, it's due to B+Tree implementation - we
>> store data in pairs (a, b). And then in the index storage the sequence will
>> be: (1, 1), (1, 2), (1, 3), (2, 1). We sort by A, and only in case of
>> equality field A, we check field B.
>>
>> So for queries like (a > 0 && b < 10) there is not much help from the
>> condition on B for reducing data slice. Our implementation doesn't skip
>> some sub-trees for conditions, but it checks tree range sequentially.
>>
>> > Worst, if the value of b is not present in the slice, it is responding
>> as if the b was not in the WHERE clause at all (it seems to run a full scan
>> on the sub-result)
>>
>> I think it can depend on the amount of data you return. If there is no
>> data suitable for your condition, you will hang until the query finishes.
>> But if you have some, it will return the cursor earlier, after preparing
>> the first page for response, see SqlFieldsQuery.setPageSize().
>>
>> Also, performance may depend on your index selectivity.
>>
>>
>> On Thu, Dec 16, 2021 at 4:23 PM Chrystophe Vergnaud <
>> chrystophe.vergnaud@gmail.com> wrote:
>>
>>> Hello Stephen,
>>>
>>> I was created with SQL :
>>> CREATE INDEX IF NOT EXISTS "t_idx_1" ON MYSCHEMA."t" ("a", "b", "c");
>>>
>>> BR,
>>>
>>> Chrystophe Vergnaud
>>> Architect @ Cyblex Technologies
>>>
>>>
>>> Le jeu. 16 déc. 2021 à 13:51, Stephen Darlington <
>>> stephen.darlington@gridgain.com> a écrit :
>>>
>>>> Can you show how you’ve defined your index(es)?
>>>>
>>>> > On 16 Dec 2021, at 12:27, Chrystophe Vergnaud <
>>>> chrystophe.vergnaud@gmail.com> wrote:
>>>> >
>>>> > Hello,
>>>> >
>>>> > I'm running an ignite 2.10 and I don't understand the behavior of the
>>>> multi-columns index.
>>>> >
>>>> > For instance, I have a table t(id, a,b,c, d, e, f, g)
>>>> > - id is a uuid and is the key
>>>> > - a is a TIMESTAMP
>>>> > - b is a  SMALLINT
>>>> > - c is a TINYINT
>>>> > - e, f, g are VARCHAR
>>>> >
>>>> > this table have around 200M lines
>>>> >
>>>> > I have to select data based on a, b, c in this order, so basically, I
>>>> have setup an index on (a,b,c)
>>>> >
>>>> > If I apply a select with a WHERE clause on "a>=x AND a < y", it works
>>>> perfectly, the response time is ok (using USE INDEX)
>>>> >
>>>> > If I add the b in the WHERE clause I expect to optimize the response
>>>> time but it is not the case. Worst, if the value of b is not present in the
>>>> slice, it is responding as if the b was not in the WHERE clause at all (it
>>>> seems to run a full scan on the sub-result)
>>>> >
>>>> > Do I miss something ? is it related to the implementation of the
>>>> B+tree ?
>>>> >
>>>> > Thanks in advance for your help.
>>>> >
>>>> > Best regards,
>>>> >
>>>> > Chrystophe Vergnaud
>>>> > Architect @ Cyblex Technologies
>>>>
>>>>
>>>>

Re: using multiple columns Index

Posted by Chrystophe Vergnaud <ch...@gmail.com>.

Hi Maxim,

Thanks for the details.

But do you have a tweak in mind to get better response time on this kind of
request ?

BR,

Chrystophe Vergnaud
Architect @ Cyblex Technologies



Le jeu. 16 déc. 2021 à 15:09, Maksim Timonin <ti...@apache.org> a
écrit :

> Hi, Chrystophe!
>
> Multifield index should perfectly work for cases with strict equality
> like: (a == ? && b == ?) OR (a == ? && b > ?) . But for queries with range
> queries for first field "a" ("a" > ? && b ...) you should not expect a
> boost of performance. You're right, it's due to B+Tree implementation - we
> store data in pairs (a, b). And then in the index storage the sequence will
> be: (1, 1), (1, 2), (1, 3), (2, 1). We sort by A, and only in case of
> equality field A, we check field B.
>
> So for queries like (a > 0 && b < 10) there is not much help from the
> condition on B for reducing data slice. Our implementation doesn't skip
> some sub-trees for conditions, but it checks tree range sequentially.
>
> > Worst, if the value of b is not present in the slice, it is responding
> as if the b was not in the WHERE clause at all (it seems to run a full scan
> on the sub-result)
>
> I think it can depend on the amount of data you return. If there is no
> data suitable for your condition, you will hang until the query finishes.
> But if you have some, it will return the cursor earlier, after preparing
> the first page for response, see SqlFieldsQuery.setPageSize().
>
> Also, performance may depend on your index selectivity.
>
>
> On Thu, Dec 16, 2021 at 4:23 PM Chrystophe Vergnaud <
> chrystophe.vergnaud@gmail.com> wrote:
>
>> Hello Stephen,
>>
>> I was created with SQL :
>> CREATE INDEX IF NOT EXISTS "t_idx_1" ON MYSCHEMA."t" ("a", "b", "c");
>>
>> BR,
>>
>> Chrystophe Vergnaud
>> Architect @ Cyblex Technologies
>>
>>
>> Le jeu. 16 déc. 2021 à 13:51, Stephen Darlington <
>> stephen.darlington@gridgain.com> a écrit :
>>
>>> Can you show how you’ve defined your index(es)?
>>>
>>> > On 16 Dec 2021, at 12:27, Chrystophe Vergnaud <
>>> chrystophe.vergnaud@gmail.com> wrote:
>>> >
>>> > Hello,
>>> >
>>> > I'm running an ignite 2.10 and I don't understand the behavior of the
>>> multi-columns index.
>>> >
>>> > For instance, I have a table t(id, a,b,c, d, e, f, g)
>>> > - id is a uuid and is the key
>>> > - a is a TIMESTAMP
>>> > - b is a  SMALLINT
>>> > - c is a TINYINT
>>> > - e, f, g are VARCHAR
>>> >
>>> > this table have around 200M lines
>>> >
>>> > I have to select data based on a, b, c in this order, so basically, I
>>> have setup an index on (a,b,c)
>>> >
>>> > If I apply a select with a WHERE clause on "a>=x AND a < y", it works
>>> perfectly, the response time is ok (using USE INDEX)
>>> >
>>> > If I add the b in the WHERE clause I expect to optimize the response
>>> time but it is not the case. Worst, if the value of b is not present in the
>>> slice, it is responding as if the b was not in the WHERE clause at all (it
>>> seems to run a full scan on the sub-result)
>>> >
>>> > Do I miss something ? is it related to the implementation of the
>>> B+tree ?
>>> >
>>> > Thanks in advance for your help.
>>> >
>>> > Best regards,
>>> >
>>> > Chrystophe Vergnaud
>>> > Architect @ Cyblex Technologies
>>>
>>>
>>>

Re: using multiple columns Index

Posted by Maksim Timonin <ti...@apache.org>.

Hi, Chrystophe!

Multifield index should perfectly work for cases with strict equality like:
(a == ? && b == ?) OR (a == ? && b > ?) . But for queries with range
queries for first field "a" ("a" > ? && b ...) you should not expect a
boost of performance. You're right, it's due to B+Tree implementation - we
store data in pairs (a, b). And then in the index storage the sequence will
be: (1, 1), (1, 2), (1, 3), (2, 1). We sort by A, and only in case of
equality field A, we check field B.

So for queries like (a > 0 && b < 10) there is not much help from the
condition on B for reducing data slice. Our implementation doesn't skip
some sub-trees for conditions, but it checks tree range sequentially.

> Worst, if the value of b is not present in the slice, it is responding as
if the b was not in the WHERE clause at all (it seems to run a full scan on
the sub-result)

I think it can depend on the amount of data you return. If there is no data
suitable for your condition, you will hang until the query finishes. But if
you have some, it will return the cursor earlier, after preparing the first
page for response, see SqlFieldsQuery.setPageSize().

Also, performance may depend on your index selectivity.


On Thu, Dec 16, 2021 at 4:23 PM Chrystophe Vergnaud <
chrystophe.vergnaud@gmail.com> wrote:

> Hello Stephen,
>
> I was created with SQL :
> CREATE INDEX IF NOT EXISTS "t_idx_1" ON MYSCHEMA."t" ("a", "b", "c");
>
> BR,
>
> Chrystophe Vergnaud
> Architect @ Cyblex Technologies
>
>
> Le jeu. 16 déc. 2021 à 13:51, Stephen Darlington <
> stephen.darlington@gridgain.com> a écrit :
>
>> Can you show how you’ve defined your index(es)?
>>
>> > On 16 Dec 2021, at 12:27, Chrystophe Vergnaud <
>> chrystophe.vergnaud@gmail.com> wrote:
>> >
>> > Hello,
>> >
>> > I'm running an ignite 2.10 and I don't understand the behavior of the
>> multi-columns index.
>> >
>> > For instance, I have a table t(id, a,b,c, d, e, f, g)
>> > - id is a uuid and is the key
>> > - a is a TIMESTAMP
>> > - b is a  SMALLINT
>> > - c is a TINYINT
>> > - e, f, g are VARCHAR
>> >
>> > this table have around 200M lines
>> >
>> > I have to select data based on a, b, c in this order, so basically, I
>> have setup an index on (a,b,c)
>> >
>> > If I apply a select with a WHERE clause on "a>=x AND a < y", it works
>> perfectly, the response time is ok (using USE INDEX)
>> >
>> > If I add the b in the WHERE clause I expect to optimize the response
>> time but it is not the case. Worst, if the value of b is not present in the
>> slice, it is responding as if the b was not in the WHERE clause at all (it
>> seems to run a full scan on the sub-result)
>> >
>> > Do I miss something ? is it related to the implementation of the B+tree
>> ?
>> >
>> > Thanks in advance for your help.
>> >
>> > Best regards,
>> >
>> > Chrystophe Vergnaud
>> > Architect @ Cyblex Technologies
>>
>>
>>

Re: using multiple columns Index

Posted by Chrystophe Vergnaud <ch...@gmail.com>.

Hello Stephen,

I was created with SQL :
CREATE INDEX IF NOT EXISTS "t_idx_1" ON MYSCHEMA."t" ("a", "b", "c");

BR,

Chrystophe Vergnaud
Architect @ Cyblex Technologies


Le jeu. 16 déc. 2021 à 13:51, Stephen Darlington <
stephen.darlington@gridgain.com> a écrit :

> Can you show how you’ve defined your index(es)?
>
> > On 16 Dec 2021, at 12:27, Chrystophe Vergnaud <
> chrystophe.vergnaud@gmail.com> wrote:
> >
> > Hello,
> >
> > I'm running an ignite 2.10 and I don't understand the behavior of the
> multi-columns index.
> >
> > For instance, I have a table t(id, a,b,c, d, e, f, g)
> > - id is a uuid and is the key
> > - a is a TIMESTAMP
> > - b is a  SMALLINT
> > - c is a TINYINT
> > - e, f, g are VARCHAR
> >
> > this table have around 200M lines
> >
> > I have to select data based on a, b, c in this order, so basically, I
> have setup an index on (a,b,c)
> >
> > If I apply a select with a WHERE clause on "a>=x AND a < y", it works
> perfectly, the response time is ok (using USE INDEX)
> >
> > If I add the b in the WHERE clause I expect to optimize the response
> time but it is not the case. Worst, if the value of b is not present in the
> slice, it is responding as if the b was not in the WHERE clause at all (it
> seems to run a full scan on the sub-result)
> >
> > Do I miss something ? is it related to the implementation of the B+tree ?
> >
> > Thanks in advance for your help.
> >
> > Best regards,
> >
> > Chrystophe Vergnaud
> > Architect @ Cyblex Technologies
>
>
>

Re: using multiple columns Index

Posted by Stephen Darlington <st...@gridgain.com>.

Can you show how you’ve defined your index(es)?

> On 16 Dec 2021, at 12:27, Chrystophe Vergnaud <ch...@gmail.com> wrote:
> 
> Hello,
> 
> I'm running an ignite 2.10 and I don't understand the behavior of the multi-columns index.
> 
> For instance, I have a table t(id, a,b,c, d, e, f, g)
> - id is a uuid and is the key
> - a is a TIMESTAMP
> - b is a  SMALLINT
> - c is a TINYINT
> - e, f, g are VARCHAR
> 
> this table have around 200M lines
> 
> I have to select data based on a, b, c in this order, so basically, I have setup an index on (a,b,c)
> 
> If I apply a select with a WHERE clause on "a>=x AND a < y", it works perfectly, the response time is ok (using USE INDEX)
> 
> If I add the b in the WHERE clause I expect to optimize the response time but it is not the case. Worst, if the value of b is not present in the slice, it is responding as if the b was not in the WHERE clause at all (it seems to run a full scan on the sub-result)
> 
> Do I miss something ? is it related to the implementation of the B+tree ?
> 
> Thanks in advance for your help.
> 
> Best regards,
> 
> Chrystophe Vergnaud
> Architect @ Cyblex Technologies