You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@ignite.apache.org by mhetea <mi...@gmail.com> on 2017/08/29 09:09:32 UTC

SQL query is slow

We have a cache with ~3million entries

We execute the following query:
select count(_key) from IgniteProduct as product where POSITION
('Z',manufacturerCode)>0

The query takes about 7s though the index is hit.

Query execution is too long [time=7103 ms, sql='SELECT
COUNT(PRODUCT__Z0._KEY) __C0_0
FROM "productCache".IGNITEPRODUCT PRODUCT__Z0
WHERE POSITION('Z', PRODUCT__Z0.MANUFACTURERCODE) > 0', plan=
SELECT
    COUNT(PRODUCT__Z0._KEY) AS __C0_0
FROM "productCache".IGNITEPRODUCT PRODUCT__Z0
    /* "productCache".IGNITEPRODUCT_MANUFACTURERCODE_IDX */
WHERE POSITION('Z', PRODUCT__Z0.MANUFACTURERCODE) > 0
, parameters=[]]
, parameters=[]]

Can somebody help/offer an alternative?



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/SQL-query-is-slow-tp16475.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: SQL query is slow

Posted by Vladimir Ozerov <vo...@gridgain.com>.

Hi Mihaela,

Index is not used in your case because you specify function-based
condition. Usually this is resolved by adding functional index, but Ignite
doesn't support it at the moment unfortunately. Is it possible to
"materialize" the condition "POSITION ('Z',manufacturerCode)>0" as
additional attribute and add an index on it? In this case SQL would look
like this and index will be used:

SELECT COUNT(_KEY) FROM IgniteProduct AS product
WHERE manufacturerCodeZ=1

Another important thing is selectivity - which fraction of records fall
under this condition?
Also I would recommend to change "COUNT(_KEY)" to "COUNT(*)".

Vladimir.

On Tue, Aug 29, 2017 at 6:05 PM, Andrey Mashenkov <
andrey.mashenkov@gmail.com> wrote:

> It is possible returned dataset is too large and cause high network
> pressure that results in large query execution time.
>
> There is no recommendation for grid nodes count.
> Simple SQL queries can work slower on large grid as most of time is spent
> in inter-node communication.
> Heavy SQL queries may show better results on larger grid as every node
> will have smaller dataset.
>
> You can try to look at page memory statistics [1] to get estimate numbers.
>
> Really, there is an issue with large OFFSET as Ignite can't just skip
> entries and have to fetch all of them from nodes.
> OFFSET makes no sense without ORDER as Ignite fetch rows from other nodes
> in async way and row order should be preserved between such queries.
> OFFSET applies on query initiator node (reduce side) after results merged
> as there is no way to understand on map side what rows should be skiped.
>
>
> Looks like underlying H2 tries to use index scan, but I don't think index
> can help in case of functional condition.
> You can try to make Ignite to have inline values in index or use separate
> field with smaller type that can be inlined. By default, index inlining is
> enabled for 10 byte length values.
> See IGNITE_MAX_INDEX_PAYLOAD_SIZE_DEFAULT system property docs and [2].
>
> [1] https://apacheignite.readme.io/v2.1/docs/memory-metrics
> [2] https://issues.apache.org/jira/browse/IGNITE-6060
>
> On Tue, Aug 29, 2017 at 3:59 PM, mhetea <mi...@gmail.com> wrote:
>
>> Thank you for your response.
>> I used query parallelizm and the time reduced to ~2.3s, which is still too
>> much.
>> Regarding 1. is there any documentation about configuration parameters
>> (recommended number of nodes, how much data should be stored on each
>> node).
>> We currently have 2 nodes with 32GB RAM each. Every 1 million records from
>> our cache occupy about 1GB (is there a way to see how much memory a cache
>> actually occupies? we look now at the Allocated next memory segment log
>> info)
>> For 3. it seems that the index is hit  from the execution plan:
>>  /* "productCache".IGNITEPRODUCT_MANUFACTURERCODE_IDX */
>> No?
>>
>> We have this issue also when we use a large OFFSET (we execute this kind
>> of
>> query because we want paginated results)
>>
>> Also, this cache will be updated frequently so we expect it to grow in
>> size.
>>
>> Thank you!
>>
>>
>>
>> --
>> View this message in context: http://apache-ignite-users.705
>> 18.x6.nabble.com/SQL-query-is-slow-tp16475p16487.html
>> Sent from the Apache Ignite Users mailing list archive at Nabble.com.
>>
>
>
>
> --
> Best regards,
> Andrey V. Mashenkov
>

Re: SQL query is slow

Posted by Andrey Mashenkov <an...@gmail.com>.

It is possible returned dataset is too large and cause high network
pressure that results in large query execution time.

There is no recommendation for grid nodes count.
Simple SQL queries can work slower on large grid as most of time is spent
in inter-node communication.
Heavy SQL queries may show better results on larger grid as every node will
have smaller dataset.

You can try to look at page memory statistics [1] to get estimate numbers.

Really, there is an issue with large OFFSET as Ignite can't just skip
entries and have to fetch all of them from nodes.
OFFSET makes no sense without ORDER as Ignite fetch rows from other nodes
in async way and row order should be preserved between such queries.
OFFSET applies on query initiator node (reduce side) after results merged
as there is no way to understand on map side what rows should be skiped.

Looks like underlying H2 tries to use index scan, but I don't think index
can help in case of functional condition.
You can try to make Ignite to have inline values in index or use separate
field with smaller type that can be inlined. By default, index inlining is
enabled for 10 byte length values.
See IGNITE_MAX_INDEX_PAYLOAD_SIZE_DEFAULT system property docs and [2].

[1] https://apacheignite.readme.io/v2.1/docs/memory-metrics
[2] https://issues.apache.org/jira/browse/IGNITE-6060

On Tue, Aug 29, 2017 at 3:59 PM, mhetea <mi...@gmail.com> wrote:

> Thank you for your response.
> I used query parallelizm and the time reduced to ~2.3s, which is still too
> much.
> Regarding 1. is there any documentation about configuration parameters
> (recommended number of nodes, how much data should be stored on each node).
> We currently have 2 nodes with 32GB RAM each. Every 1 million records from
> our cache occupy about 1GB (is there a way to see how much memory a cache
> actually occupies? we look now at the Allocated next memory segment log
> info)
> For 3. it seems that the index is hit  from the execution plan:
>  /* "productCache".IGNITEPRODUCT_MANUFACTURERCODE_IDX */
> No?
>
> We have this issue also when we use a large OFFSET (we execute this kind of
> query because we want paginated results)
>
> Also, this cache will be updated frequently so we expect it to grow in
> size.
>
> Thank you!
>
>
>
> --
> View this message in context: http://apache-ignite-users.
> 70518.x6.nabble.com/SQL-query-is-slow-tp16475p16487.html
> Sent from the Apache Ignite Users mailing list archive at Nabble.com.
>

-- 
Best regards,
Andrey V. Mashenkov

Re: SQL query is slow

Posted by mhetea <mi...@gmail.com>.

Thank you for your response. 
I used query parallelizm and the time reduced to ~2.3s, which is still too
much. 
Regarding 1. is there any documentation about configuration parameters
(recommended number of nodes, how much data should be stored on each node).
We currently have 2 nodes with 32GB RAM each. Every 1 million records from
our cache occupy about 1GB (is there a way to see how much memory a cache
actually occupies? we look now at the Allocated next memory segment log
info)
For 3. it seems that the index is hit  from the execution plan:
 /* "productCache".IGNITEPRODUCT_MANUFACTURERCODE_IDX */  
No?

We have this issue also when we use a large OFFSET (we execute this kind of
query because we want paginated results) 

Also, this cache will be updated frequently so we expect it to grow in size.  

Thank you! 



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/SQL-query-is-slow-tp16475p16487.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: SQL query is slow

Posted by Andrey Mashenkov <an...@gmail.com>.

Hi,

 You can try to
1. add more nodes to reduce data set on each node.
2. use query parallelizm [1].
3. add a indexed field to be used in WHERE clause instead of function.

[1]
https://apacheignite.readme.io/docs/sql-performance-and-debugging#query-parallelism

On Tue, Aug 29, 2017 at 12:09 PM, mhetea <mi...@gmail.com> wrote:

> We have a cache with ~3million entries
>
> We execute the following query:
> select count(_key) from IgniteProduct as product where POSITION
> ('Z',manufacturerCode)>0
>
> The query takes about 7s though the index is hit.
>
> Query execution is too long [time=7103 ms, sql='SELECT
> COUNT(PRODUCT__Z0._KEY) __C0_0
> FROM "productCache".IGNITEPRODUCT PRODUCT__Z0
> WHERE POSITION('Z', PRODUCT__Z0.MANUFACTURERCODE) > 0', plan=
> SELECT
>     COUNT(PRODUCT__Z0._KEY) AS __C0_0
> FROM "productCache".IGNITEPRODUCT PRODUCT__Z0
>     /* "productCache".IGNITEPRODUCT_MANUFACTURERCODE_IDX */
> WHERE POSITION('Z', PRODUCT__Z0.MANUFACTURERCODE) > 0
> , parameters=[]]
> , parameters=[]]
>
> Can somebody help/offer an alternative?
>
>
>
> --
> View this message in context: http://apache-ignite-users.
> 70518.x6.nabble.com/SQL-query-is-slow-tp16475.html
> Sent from the Apache Ignite Users mailing list archive at Nabble.com.
>



-- 
Best regards,
Andrey V. Mashenkov