You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by Lucky <wa...@163.com> on 2017/09/19 02:07:23 UTC

回复: Fetched result use too much time


Please see the attachment.
I have set query parallelism  to 30. it took 42 seconds.
But it is not enough.
I excepted it took less than 3 seconds.


then,I have 3 nodes.


As for the 3589 number, we need to check the number of ID using in conditions. Only the number of times used is equal to the record in the in condition. That's the record we need. This is the business scenario required. I can't change this.


Thanks for your suggestion.
Lucky




2017年09月18日 21:55,Vladimir Ozerov<vo...@gridgain.com> :
Hi Lucky,


Could you please share you data model and node/cache configuration? I want to make sure that proper indexes are set. I will be able to advise something then. As I quick suggestion you may try to increase query parallelism on your "databaseDAssignCache". Please try setting it to the number of cores on your server nodes. Relevant property - CacheConfifuration.queryParallelism. Btw, how many nodes do you have?


Also I am struggling to understand the number "3589". Why this number appears both as ">= 3589" condition and as a number of parameters inside "IN" clause? 


Vladimir.


回复: Fetched result use too much time

Posted by Lucky <wa...@163.com>.
Andrey Mashenkov
    Thank you very much!
    1.query parallelism:this will cause a problem: fetch wrong reslut. 
       I set it to 10,and have table a with 150,000 records, table b with 12,000,000 records.
       when I query single table,the result is correct.
       but when the sql is like this:
           select a.id from a inner join b on a.id = b.tid 
      it got the wrong result. The result should be 11,000,000;but it just return 380,000 records.
      when I remove query parallelism setting,it return correctly. 


    2. I have modified ths property,and restart the server.for the record is too large, it need 4 hours to load data to ignite.So I have to wait.
    3.Actually, if I remove the group by clause and having condition, it took more time!
    4  and 5: I have try them before ,but it did not work.
Thanks again.
Lucky   


在2017年09月21日 21:28,Andrey Mashenkov<an...@gmail.com> 写道:
Lucky,




1. Looks like it make no sense to set query parallelism level higher number of available CPU on node.


2. Map query use index for field FASSIGCUID type of String and seems values are 16 chars length strings (32 bytes)
By default, values with size < 10 bytes can be inlined in index, so Ignite doesn't need to lookup a data page for value data. 
You can try to increase it up to 32 via cacheConfiguration.setSqlIndexMaxInlineSize(32) or JVM property -DIGNITE_MAX_INDEX_PAYLOAD_SIZE=32.


3. Ignite doesn't know whether your data is collocated by FDATABASEDID (group by clause) or not collocated.
So, Ignite can't apply HAVING condition instantly on map phase and have to load and merge all groups from all nodes before check for HAVING.
If it possible to collocate data on GROUP BY condition, you can hint Ignite with setting query flag:   sqlFieldsQuery.setCollocated(true).
However, I'm not sure it will help much and H2 will be able to make any optimization here.


4. Also, you can force Ignite to use different index. E.g. group index on FDATABASEDID and FASSIGCUID and same fields in different order.


5. Sometimes, Ignite change join order and it can cause unexcpected slowdown. You can try to change join order by changing tables positions in query string.
To preserve Ignite join order optimization you may use a flag:  sqlFieldsQuery.setEnforceJoinOrder(true).




Hope, this will help you.


Re:Re: Re: Fetched result use too much time

Posted by Lucky <wa...@163.com>.
My ignite version is 2.2.





在 2017-10-10 20:58:55,"Andrey Mashenkov" <an...@gmail.com> 写道:

Hi Lucky,


Looks like your query selectivity is poor and even with GroupBy large amount of data shoud be fetched to reduce node.


1. Is it possiblt to coolocate data on field used in OrderBy clause?
2. Looks weird that queryParallelizm cause wrong results. Looks like you have a single node grid and there is a bug in queryParallelizm feature.
 Also I can find what ignite version you use. Would you try to switch to the latest one?


On Tue, Oct 10, 2017 at 2:48 PM, Lucky <wa...@163.com> wrote:

Andrey Mashenkov
    Thank you very much!
    1.query parallelism:this will cause a problem: fetch wrong reslut. 
       I set it to 10,and have table a with 150,000 records, table b with 12,000,000 records.
       when I query single table,the result is correct.
       but when the sql is like this:
           select a.id from a inner join b on a.id = b.tid 
      it got the wrong result. The result should be 11,000,000;but it just return 380,000 records.
      when I remove query parallelism setting,it return correctly. 


    2. I have modified ths property,and restart the server.for the record is too large, it need 4 hours to load data to ignite.So I have to wait.
    3.Actually, if I remove the group by clause and having condition, it took more time!
    4  and 5: I have try them before ,but it did not work.
Thanks again.
Lucky 





At 2017-09-21 21:28:40, "Andrey Mashenkov" <an...@gmail.com> wrote:

Lucky,




1. Looks like it make no sense to set query parallelism level higher number of available CPU on node.


2. Map query use index for field FASSIGCUID type of String and seems values are 16 chars length strings (32 bytes)
By default, values with size < 10 bytes can be inlined in index, so Ignite doesn't need to lookup a data page for value data. 
You can try to increase it up to 32 via cacheConfiguration.setSqlIndexMaxInlineSize(32) or JVM property -DIGNITE_MAX_INDEX_PAYLOAD_SIZE=32.


3. Ignite doesn't know whether your data is collocated by FDATABASEDID (group by clause) or not collocated.
So, Ignite can't apply HAVING condition instantly on map phase and have to load and merge all groups from all nodes before check for HAVING.
If it possible to collocate data on GROUP BY condition, you can hint Ignite with setting query flag:   sqlFieldsQuery.setCollocated(true).
However, I'm not sure it will help much and H2 will be able to make any optimization here.


4. Also, you can force Ignite to use different index. E.g. group index on FDATABASEDID and FASSIGCUID and same fields in different order.


5. Sometimes, Ignite change join order and it can cause unexcpected slowdown. You can try to change join order by changing tables positions in query string.
To preserve Ignite join order optimization you may use a flag:  sqlFieldsQuery.setEnforceJoinOrder(true).




Hope, this will help you.




--

Best regards,
Andrey V. Mashenkov

Re: Re: Fetched result use too much time

Posted by Andrey Mashenkov <an...@gmail.com>.
Hi Lucky,

Looks like your query selectivity is poor and even with GroupBy large
amount of data shoud be fetched to reduce node.

1. Is it possiblt to coolocate data on field used in OrderBy clause?
2. Looks weird that queryParallelizm cause wrong results. Looks like you
have a single node grid and there is a bug in queryParallelizm feature.
 Also I can find what ignite version you use. Would you try to switch to
the latest one?

On Tue, Oct 10, 2017 at 2:48 PM, Lucky <wa...@163.com> wrote:

> Andrey Mashenkov <an...@gmail.com>
>     Thank you very much!
>     1.query parallelism:this will cause a problem: fetch wrong reslut.
>        I set it to 10,and have table a with 150,000 records, table b with
> 12,000,000 records.
>        when I query single table,the result is correct.
>        but when the sql is like this:
>            select a.id from a inner join b on a.id = b.tid
>       it got the wrong result. The result should be 11,000,000;but it just
> return 380,000 records.
>       when I remove query parallelism setting,it return correctly.
>
>     2. I have modified ths property,and restart the server.for the record
> is too large, it need 4 hours to load data to ignite.So I have to wait.
>     3.Actually, if I remove the group by clause and having condition, it
> took more time!
>     4  and 5: I have try them before ,but it did not work.
> Thanks again.
> Lucky
>
>
>
>
> At 2017-09-21 21:28:40, "Andrey Mashenkov" <an...@gmail.com>
> wrote:
>
> Lucky,
>
>
> 1. Looks like it make no sense to set query parallelism level higher
> number of available CPU on node.
>
> 2. Map query use index for field FASSIGCUID type of String and seems
> values are 16 chars length strings (32 bytes)
> By default, values with size < 10 bytes can be inlined in index, so Ignite
> doesn't need to lookup a data page for value data.
> You can try to increase it up to 32 via*
> cacheConfiguration.setSqlIndexMaxInlineSize(32) *or JVM property
> *-DIGNITE_MAX_INDEX_PAYLOAD_SIZE=32*.
>
> 3. Ignite doesn't know whether your data is collocated by FDATABASEDID
> (group by clause) or not collocated.
> So, Ignite can't apply HAVING condition instantly on map phase and have to
> load and merge all groups from all nodes before check for HAVING.
> If it possible to collocate data on GROUP BY condition, you can hint
> Ignite with setting query flag:   *sqlFieldsQuery.setCollocated(true).*
> However, I'm not sure it will help much and H2 will be able to make any
> optimization here.
>
> 4. Also, you can force Ignite to use different index. E.g. group index on
> FDATABASEDID and FASSIGCUID and same fields in different order.
>
> 5. Sometimes, Ignite change join order and it can cause unexcpected
> slowdown. You can try to change join order by changing tables positions in
> query string.
> To preserve Ignite join order optimization you may use a flag:
> *sqlFieldsQuery.setEnforceJoinOrder(true).*
>
>
> Hope, this will help you.
>
>
>
> 【网易自营】好吃到爆!鲜香弹滑加热即食,经典13香/麻辣小龙虾仅75元3斤>>
> <http://you.163.com/item/detail?id=1183001&from=web_gg_mail_jiaobiao_7>
>
>



-- 
Best regards,
Andrey V. Mashenkov

Re:Re: Fetched result use too much time

Posted by Lucky <wa...@163.com>.
Andrey Mashenkov
    Thank you very much!
    1.query parallelism:this will cause a problem: fetch wrong reslut. 
       I set it to 10,and have table a with 150,000 records, table b with 12,000,000 records.
       when I query single table,the result is correct.
       but when the sql is like this:
           select a.id from a inner join b on a.id = b.tid 
      it got the wrong result. The result should be 11,000,000;but it just return 380,000 records.
      when I remove query parallelism setting,it return correctly. 


    2. I have modified ths property,and restart the server.for the record is too large, it need 4 hours to load data to ignite.So I have to wait.
    3.Actually, if I remove the group by clause and having condition, it took more time!
    4  and 5: I have try them before ,but it did not work.
Thanks again.
Lucky 





At 2017-09-21 21:28:40, "Andrey Mashenkov" <an...@gmail.com> wrote:

Lucky,




1. Looks like it make no sense to set query parallelism level higher number of available CPU on node.


2. Map query use index for field FASSIGCUID type of String and seems values are 16 chars length strings (32 bytes)
By default, values with size < 10 bytes can be inlined in index, so Ignite doesn't need to lookup a data page for value data. 
You can try to increase it up to 32 via cacheConfiguration.setSqlIndexMaxInlineSize(32) or JVM property -DIGNITE_MAX_INDEX_PAYLOAD_SIZE=32.


3. Ignite doesn't know whether your data is collocated by FDATABASEDID (group by clause) or not collocated.
So, Ignite can't apply HAVING condition instantly on map phase and have to load and merge all groups from all nodes before check for HAVING.
If it possible to collocate data on GROUP BY condition, you can hint Ignite with setting query flag:   sqlFieldsQuery.setCollocated(true).
However, I'm not sure it will help much and H2 will be able to make any optimization here.


4. Also, you can force Ignite to use different index. E.g. group index on FDATABASEDID and FASSIGCUID and same fields in different order.


5. Sometimes, Ignite change join order and it can cause unexcpected slowdown. You can try to change join order by changing tables positions in query string.
To preserve Ignite join order optimization you may use a flag:  sqlFieldsQuery.setEnforceJoinOrder(true).




Hope, this will help you.


Re: Fetched result use too much time

Posted by Andrey Mashenkov <an...@gmail.com>.
Lucky,


1. Looks like it make no sense to set query parallelism level higher number
of available CPU on node.

2. Map query use index for field FASSIGCUID type of String and seems values
are 16 chars length strings (32 bytes)
By default, values with size < 10 bytes can be inlined in index, so Ignite
doesn't need to lookup a data page for value data.
You can try to increase it up to 32 via*
cacheConfiguration.setSqlIndexMaxInlineSize(32) *or JVM property
*-DIGNITE_MAX_INDEX_PAYLOAD_SIZE=32*.

3. Ignite doesn't know whether your data is collocated by FDATABASEDID
(group by clause) or not collocated.
So, Ignite can't apply HAVING condition instantly on map phase and have to
load and merge all groups from all nodes before check for HAVING.
If it possible to collocate data on GROUP BY condition, you can hint Ignite
with setting query flag:   *sqlFieldsQuery.setCollocated(true).*
However, I'm not sure it will help much and H2 will be able to make any
optimization here.

4. Also, you can force Ignite to use different index. E.g. group index on
FDATABASEDID and FASSIGCUID and same fields in different order.

5. Sometimes, Ignite change join order and it can cause unexcpected
slowdown. You can try to change join order by changing tables positions in
query string.
To preserve Ignite join order optimization you may use a flag:
*sqlFieldsQuery.setEnforceJoinOrder(true).*


Hope, this will help you.

On Tue, Sep 19, 2017 at 5:07 AM, Lucky <wa...@163.com> wrote:

>
> Please see the attachment.
> I have set query parallelism  to 30. it took 42 seconds.
> But it is not enough.
> I excepted it took less than 3 seconds.
>
> then,I have 3 nodes.
>
> As for the 3589 number, we need to check the number of ID using in
> conditions. Only the number of times used is equal to the record in the in
> condition. That's the record we need. This is the business scenario
> required. I can't change this.
>
> Thanks for your suggestion.
> Lucky
>
>
> 2017年09月18日 21:55,Vladimir Ozerov<vo...@gridgain.com>
> <vo...@gridgain.com> :
>
> Hi Lucky,
>
> Could you please share you data model and node/cache configuration? I want
> to make sure that proper indexes are set. I will be able to advise
> something then. As I quick suggestion you may try to increase query
> parallelism on your "databaseDAssignCache". Please try setting it to the
> number of cores on your server nodes. Relevant property -
> CacheConfifuration.queryParallelism. Btw, how many nodes do you have?
>
> Also I am struggling to understand the number "3589". Why this number
> appears both as ">= 3589" condition and as a number of parameters inside
> "IN" clause?
>
> Vladimir.
>
>


-- 
Best regards,
Andrey V. Mashenkov