You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by njcstreet <nj...@gmail.com> on 2020/06/10 10:54:43 UTC

Query using IN Clause slow

Hi,

I am seeing poor performance on queries which use IN in the WHERE clause
with multiple items supplied. I have searched for issues related to this and
I have seen replies suggesting that when you use IN, that indexes are not
used, but I guess this must have been fixed because I do see indexes being
used in the query plan (but the query is still slow). I've seen suggestions
to re-write IN as a JOIN but I've raised a similar issue where I am seeing
that Indexes aren't used on joins (unless the query is re-written in such a
way that makes it prohibitively slow).  
What I am observing is that on a table with roughly 300 million entries in
it, if I use the IN clause, the query takes a few seconds, where as if I
re-write it is a UNION with each query filtering on only one item, it
returns in milliseconds. The issue is the user could be filtering on lots of
items and on multiple attributes, so if I do this I will end up with a huge
number of unions.

Example 1 - using IN clause, filtering on two items. Returns slowly (about 8
seconds).


 SELECT product_name, location_name,

SUM(revenue)

FROM FactTableRevenue

WHERE date_key = 20200604

AND product_name IN ('Product1', 'Product2')

GROUP BY product_name, location_name

 Query plan:
SELECT

    __Z0.PRODUCT_NAME AS __C0_0,

    __Z0.LOCATION_NAME AS __C0_1,

    SUM(__Z0.REVENUE) AS __C0_2

FROM PUBLIC.FACTTABLEREVENUE __Z0

    /* PUBLIC.IX_REVENUE_DATE_PRODUCT_LOCATION: PRODUCT_NAME IN('Product1',
'Product2')

       AND DATE_KEY = 20200604

     */

WHERE (__Z0.PRODUCT_NAME IN('Product1', 'Product2'))

    AND (__Z0.DATE_KEY = 20200604)

GROUP BY __Z0.PRODUCT_NAME, __Z0.LOCATION_NAME

 
Example 2 - Re-written as a union. Returns very quickly (50 ms)

SELECT product_name, location_name,

SUM(revenue)

FROM FactTableRevenue

WHERE date_key = 20200604

AND product_name IN ('Product1')

GROUP BY product_name, location_name

 

UNION ALL

 

SELECT product_name, location_name,

SUM(revenue)

FROM FactTableRevenue

WHERE date_key = 20200604

AND product_name IN ('Product2')

GROUP BY product_name, location_name

 

SELECT

    __Z0.PRODUCT_NAME AS __C0_0,

    __Z0.LOCATION_NAME AS __C0_1,

    SUM(__Z0.REVENUE) AS __C0_2

FROM PUBLIC.FACTTABLEREVENUE __Z0

    /* PUBLIC.IX_REVENUE_DATE_PRODUCT_LOCATION: DATE_KEY = 20200604

        AND PRODUCT_NAME = 'Product1'

     */

WHERE (__Z0.DATE_KEY = 20200604)

    AND (__Z0.PRODUCT_NAME = 'Product1')

GROUP BY __Z0.PRODUCT_NAME, __Z0.LOCATION_NAME



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Query using IN Clause slow

Posted by Ilya Kasnacheev <il...@gmail.com>.
Hello!

We had an issue with IN clause, which we thought fixed on a general case,
but if you have a complex condition it may still manifest.

Does it change if you just use (product_name = 'Product1' OR product_name =
'Product2')?

Regards,
-- 
Ilya Kasnacheev


ср, 10 июн. 2020 г. в 13:54, njcstreet <nj...@gmail.com>:

> Hi,
>
> I am seeing poor performance on queries which use IN in the WHERE clause
> with multiple items supplied. I have searched for issues related to this
> and
> I have seen replies suggesting that when you use IN, that indexes are not
> used, but I guess this must have been fixed because I do see indexes being
> used in the query plan (but the query is still slow). I've seen suggestions
> to re-write IN as a JOIN but I've raised a similar issue where I am seeing
> that Indexes aren't used on joins (unless the query is re-written in such a
> way that makes it prohibitively slow).
> What I am observing is that on a table with roughly 300 million entries in
> it, if I use the IN clause, the query takes a few seconds, where as if I
> re-write it is a UNION with each query filtering on only one item, it
> returns in milliseconds. The issue is the user could be filtering on lots
> of
> items and on multiple attributes, so if I do this I will end up with a huge
> number of unions.
>
> Example 1 - using IN clause, filtering on two items. Returns slowly (about
> 8
> seconds).
>
>
>  SELECT product_name, location_name,
>
> SUM(revenue)
>
> FROM FactTableRevenue
>
> WHERE date_key = 20200604
>
> AND product_name IN ('Product1', 'Product2')
>
> GROUP BY product_name, location_name
>
>  Query plan:
> SELECT
>
>     __Z0.PRODUCT_NAME AS __C0_0,
>
>     __Z0.LOCATION_NAME AS __C0_1,
>
>     SUM(__Z0.REVENUE) AS __C0_2
>
> FROM PUBLIC.FACTTABLEREVENUE __Z0
>
>     /* PUBLIC.IX_REVENUE_DATE_PRODUCT_LOCATION: PRODUCT_NAME IN('Product1',
> 'Product2')
>
>        AND DATE_KEY = 20200604
>
>      */
>
> WHERE (__Z0.PRODUCT_NAME IN('Product1', 'Product2'))
>
>     AND (__Z0.DATE_KEY = 20200604)
>
> GROUP BY __Z0.PRODUCT_NAME, __Z0.LOCATION_NAME
>
>
> Example 2 - Re-written as a union. Returns very quickly (50 ms)
>
> SELECT product_name, location_name,
>
> SUM(revenue)
>
> FROM FactTableRevenue
>
> WHERE date_key = 20200604
>
> AND product_name IN ('Product1')
>
> GROUP BY product_name, location_name
>
>
>
> UNION ALL
>
>
>
> SELECT product_name, location_name,
>
> SUM(revenue)
>
> FROM FactTableRevenue
>
> WHERE date_key = 20200604
>
> AND product_name IN ('Product2')
>
> GROUP BY product_name, location_name
>
>
>
> SELECT
>
>     __Z0.PRODUCT_NAME AS __C0_0,
>
>     __Z0.LOCATION_NAME AS __C0_1,
>
>     SUM(__Z0.REVENUE) AS __C0_2
>
> FROM PUBLIC.FACTTABLEREVENUE __Z0
>
>     /* PUBLIC.IX_REVENUE_DATE_PRODUCT_LOCATION: DATE_KEY = 20200604
>
>         AND PRODUCT_NAME = 'Product1'
>
>      */
>
> WHERE (__Z0.DATE_KEY = 20200604)
>
>     AND (__Z0.PRODUCT_NAME = 'Product1')
>
> GROUP BY __Z0.PRODUCT_NAME, __Z0.LOCATION_NAME
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>