You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by kmandalas <Ky...@IRIworldwide.com> on 2017/05/02 13:33:02 UTC

Ignite SQL Indexing Performance problems

Hello,

I have a cache in REPLICATED mode populated with the contents of a single DB
table having ~2 million rows.

I am running a simple SQL query like:
select * from simulation_initial_values
    where category_id in (1,2,3,4,5,6)
        and geo_channel_id in (3,4,5,6) 
        and type_id=3 
        and week between 1888 and 1939;

I have indexes at all the above fields and the queries are of course
executed in local mode (since I have REPLICATED cache of read-only nature).
The performance of the query is poor, I get warnings in the log and I see
that no indexing is used.

I saw the 
https://apacheignite.readme.io/docs/sql-performance-and-debugging#sql-performance-and-usability-considerations
<https://apacheignite.readme.io/docs/sql-performance-and-debugging#sql-performance-and-usability-considerations>  
where it says that if a query contains IN operators  then this query will
not use indexes. I followed the alternative approach proposed by replacing
the IN clause with JOIN but still the same poor results. I paste log below:

[14:36:53,479][WARN ][pub-#86%ppsp-cluster-IV%][IgniteH2Indexing] Query
execution is too long [time=7133 ms, sql='select * from
SimulationInitialValues siv  join table(id bigint = ?) c on siv.categoryId =
c.id  join table(id bigint = ?) g on siv.geoChannelId = g.id  and siv.type =
? and siv.week between ? and ?', plan=
SELECT
    SIV._KEY,
    SIV._VAL,
    SIV.ID,
    SIV.CATEGORYID,
    SIV.GEOCHANNELID,
    SIV.GEOID,
    SIV.PRODUCTID,
    SIV.PPGID,
    SIV.TYPE,
    SIV.WEEK,
    C.ID,
    G.ID
FROM "PPSP-IMDG-CACHE".SIMULATIONINITIALVALUES SIV
    /* "PPSP-IMDG-CACHE"."type_idx": TYPE = ?3 */
    /* WHERE (SIV.TYPE = ?3)
        AND ((SIV.WEEK >= ?4)
        AND (SIV.WEEK <= ?5))
    */
INNER JOIN TABLE(ID BIGINT=?1) C
    /* function: ID = SIV.CATEGORYID
        AND ID = SIV.CATEGORYID
     */
    ON 1=1
    /* WHERE SIV.CATEGORYID = C.ID
    */
INNER JOIN TABLE(ID BIGINT=?2) G
    /* function: ID = SIV.GEOCHANNELID
        AND ID = SIV.GEOCHANNELID
     */
    ON 1=1
WHERE (SIV.CATEGORYID = C.ID)
    AND (((SIV.WEEK >= ?4)
    AND (SIV.WEEK <= ?5))
    AND ((SIV.TYPE = ?3)
    AND (SIV.GEOCHANNELID = G.ID)))
, parameters=[[Ljava.lang.Object;@7346a49b, [Ljava.lang.Object;@5458bfc9, 2,
1888, 1939]]

       



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Ignite-SQL-Indexing-Performance-problems-tp12342.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Ignite SQL Indexing Performance problems

Posted by Sergi Vladykin <se...@gmail.com>.
Yes, you have to go with JOIN and create better index for your query.

Sergi

2017-05-03 16:27 GMT+03:00 kmandalas <Ky...@iriworldwide.com>:

> @Andrey: there is reason that I have selected REPLICATED over PARTITIONED
> cache due to my business case. Custom Affinity is under investigation,
> however seems over complicated for my scenario for starters. Now, even if
> PARTITIONED is used and things get done in parallel, the Network I/O is
> involved when reducing the results and in my case I get back rows with many
> columns and not a couple of SUM or AVG
>
> @Sergi: My query has an IN clause. At the following link
> https://apacheignite.readme.io/docs/sql-performance-and-
> debugging#sql-performance-and-usability-considerations
> <https://apacheignite.readme.io/docs/sql-performance-and-
> debugging#sql-performance-and-usability-considerations>
> is mentiond that: /If the query contains an IN operator, there can be two
> issues: First, it is impossible to provide a variable list of parameters.
> That means that you have to specify the exact list in the query, for
> example, where id in (?, ?, ?). You cannot write - where id in ? and pass
> an
> array or collection. Second, this query will not use indexes. /
>
> So, let's say I create a grouped Index. Will it cover me from the case
> above? Or I will have to go with JOIN?
>
>
>
> --
> View this message in context: http://apache-ignite-users.
> 70518.x6.nabble.com/Ignite-SQL-Indexing-Performance-
> problems-tp12342p12391.html
> Sent from the Apache Ignite Users mailing list archive at Nabble.com.
>

Re: Ignite SQL Indexing Performance problems

Posted by kmandalas <Ky...@IRIworldwide.com>.
@Andrey: there is reason that I have selected REPLICATED over PARTITIONED
cache due to my business case. Custom Affinity is under investigation,
however seems over complicated for my scenario for starters. Now, even if
PARTITIONED is used and things get done in parallel, the Network I/O is
involved when reducing the results and in my case I get back rows with many
columns and not a couple of SUM or AVG

@Sergi: My query has an IN clause. At the following link 
https://apacheignite.readme.io/docs/sql-performance-and-debugging#sql-performance-and-usability-considerations
<https://apacheignite.readme.io/docs/sql-performance-and-debugging#sql-performance-and-usability-considerations>  
is mentiond that: /If the query contains an IN operator, there can be two
issues: First, it is impossible to provide a variable list of parameters.
That means that you have to specify the exact list in the query, for
example, where id in (?, ?, ?). You cannot write - where id in ? and pass an
array or collection. Second, this query will not use indexes. /

So, let's say I create a grouped Index. Will it cover me from the case
above? Or I will have to go with JOIN?



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Ignite-SQL-Indexing-Performance-problems-tp12342p12391.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Ignite SQL Indexing Performance problems

Posted by Andrey Mashenkov <an...@gmail.com>.
Sergi,

AFAIK, you suggest to make query run on replicated cache in similar way as
on partitioned.
Do you mean that query will run on one node in single thread as it works
now?

On Wed, May 3, 2017 at 3:39 PM, Sergi Vladykin <se...@gmail.com>
wrote:

> Andrey,
>
> In 2.0 queries over replicated cache still work in a single thread.
>
> Kyriakos,
>
> I see that only index on TYPE is used which is not the most selective
> obviously.
>
> I suggest to create a group index on (TYPE, CATEGORYID, GEOCHANNELID,
> WEEK) and make sure it is used in query plan.
>
> Sergi
>
>
> 2017-05-03 15:02 GMT+03:00 Andrey Mashenkov <an...@gmail.com>:
>
>> Hi,
>>
>> As you use REPLICATED cache, the query will run on one node in single
>> thread. This should be fixed in 2.0
>> The only workaround for now is to convert cache to PARTITIONED, to make
>> the query to be run on several nodes, that will significantly increase
>> performance.
>>
>> Also, you can try to add more field in index "PPSP-IMDG-CACHE"."type_
>> idx".
>>
>> On Tue, May 2, 2017 at 4:33 PM, kmandalas <Kyriakos.Mandalas@iriworldwid
>> e.com> wrote:
>>
>>> Hello,
>>>
>>> I have a cache in REPLICATED mode populated with the contents of a
>>> single DB
>>> table having ~2 million rows.
>>>
>>> I am running a simple SQL query like:
>>> select * from simulation_initial_values
>>>     where category_id in (1,2,3,4,5,6)
>>>         and geo_channel_id in (3,4,5,6)
>>>         and type_id=3
>>>         and week between 1888 and 1939;
>>>
>>> I have indexes at all the above fields and the queries are of course
>>> executed in local mode (since I have REPLICATED cache of read-only
>>> nature).
>>> The performance of the query is poor, I get warnings in the log and I see
>>> that no indexing is used.
>>>
>>> I saw the
>>> https://apacheignite.readme.io/docs/sql-performance-and-debu
>>> gging#sql-performance-and-usability-considerations
>>> <https://apacheignite.readme.io/docs/sql-performance-and-deb
>>> ugging#sql-performance-and-usability-considerations>
>>> where it says that if a query contains IN operators  then this query will
>>> not use indexes. I followed the alternative approach proposed by
>>> replacing
>>> the IN clause with JOIN but still the same poor results. I paste log
>>> below:
>>>
>>> [14:36:53,479][WARN ][pub-#86%ppsp-cluster-IV%][IgniteH2Indexing] Query
>>> execution is too long [time=7133 ms, sql='select * from
>>> SimulationInitialValues siv  join table(id bigint = ?) c on
>>> siv.categoryId =
>>> c.id  join table(id bigint = ?) g on siv.geoChannelId = g.id  and
>>> siv.type =
>>> ? and siv.week between ? and ?', plan=
>>> SELECT
>>>     SIV._KEY,
>>>     SIV._VAL,
>>>     SIV.ID,
>>>     SIV.CATEGORYID,
>>>     SIV.GEOCHANNELID,
>>>     SIV.GEOID,
>>>     SIV.PRODUCTID,
>>>     SIV.PPGID,
>>>     SIV.TYPE,
>>>     SIV.WEEK,
>>>     C.ID,
>>>     G.ID
>>> FROM "PPSP-IMDG-CACHE".SIMULATIONINITIALVALUES SIV
>>>     /* "PPSP-IMDG-CACHE"."type_idx": TYPE = ?3 */
>>>     /* WHERE (SIV.TYPE = ?3)
>>>         AND ((SIV.WEEK >= ?4)
>>>         AND (SIV.WEEK <= ?5))
>>>     */
>>> INNER JOIN TABLE(ID BIGINT=?1) C
>>>     /* function: ID = SIV.CATEGORYID
>>>         AND ID = SIV.CATEGORYID
>>>      */
>>>     ON 1=1
>>>     /* WHERE SIV.CATEGORYID = C.ID
>>>     */
>>> INNER JOIN TABLE(ID BIGINT=?2) G
>>>     /* function: ID = SIV.GEOCHANNELID
>>>         AND ID = SIV.GEOCHANNELID
>>>      */
>>>     ON 1=1
>>> WHERE (SIV.CATEGORYID = C.ID)
>>>     AND (((SIV.WEEK >= ?4)
>>>     AND (SIV.WEEK <= ?5))
>>>     AND ((SIV.TYPE = ?3)
>>>     AND (SIV.GEOCHANNELID = G.ID)))
>>> , parameters=[[Ljava.lang.Object;@7346a49b, [Ljava.lang.Object;@5458bfc9,
>>> 2,
>>> 1888, 1939]]
>>>
>>>
>>>
>>>
>>>
>>> --
>>> View this message in context: http://apache-ignite-users.705
>>> 18.x6.nabble.com/Ignite-SQL-Indexing-Performance-problems-tp12342.html
>>> Sent from the Apache Ignite Users mailing list archive at Nabble.com.
>>>
>>
>>
>>
>> --
>> Best regards,
>> Andrey V. Mashenkov
>>
>
>


-- 
Best regards,
Andrey V. Mashenkov

Re: Ignite SQL Indexing Performance problems

Posted by Sergi Vladykin <se...@gmail.com>.
Andrey,

In 2.0 queries over replicated cache still work in a single thread.

Kyriakos,

I see that only index on TYPE is used which is not the most selective
obviously.

I suggest to create a group index on (TYPE, CATEGORYID, GEOCHANNELID, WEEK)
and make sure it is used in query plan.

Sergi


2017-05-03 15:02 GMT+03:00 Andrey Mashenkov <an...@gmail.com>:

> Hi,
>
> As you use REPLICATED cache, the query will run on one node in single
> thread. This should be fixed in 2.0
> The only workaround for now is to convert cache to PARTITIONED, to make
> the query to be run on several nodes, that will significantly increase
> performance.
>
> Also, you can try to add more field in index "PPSP-IMDG-CACHE"."type_idx".
>
> On Tue, May 2, 2017 at 4:33 PM, kmandalas <Kyriakos.Mandalas@
> iriworldwide.com> wrote:
>
>> Hello,
>>
>> I have a cache in REPLICATED mode populated with the contents of a single
>> DB
>> table having ~2 million rows.
>>
>> I am running a simple SQL query like:
>> select * from simulation_initial_values
>>     where category_id in (1,2,3,4,5,6)
>>         and geo_channel_id in (3,4,5,6)
>>         and type_id=3
>>         and week between 1888 and 1939;
>>
>> I have indexes at all the above fields and the queries are of course
>> executed in local mode (since I have REPLICATED cache of read-only
>> nature).
>> The performance of the query is poor, I get warnings in the log and I see
>> that no indexing is used.
>>
>> I saw the
>> https://apacheignite.readme.io/docs/sql-performance-and-debu
>> gging#sql-performance-and-usability-considerations
>> <https://apacheignite.readme.io/docs/sql-performance-and-deb
>> ugging#sql-performance-and-usability-considerations>
>> where it says that if a query contains IN operators  then this query will
>> not use indexes. I followed the alternative approach proposed by replacing
>> the IN clause with JOIN but still the same poor results. I paste log
>> below:
>>
>> [14:36:53,479][WARN ][pub-#86%ppsp-cluster-IV%][IgniteH2Indexing] Query
>> execution is too long [time=7133 ms, sql='select * from
>> SimulationInitialValues siv  join table(id bigint = ?) c on
>> siv.categoryId =
>> c.id  join table(id bigint = ?) g on siv.geoChannelId = g.id  and
>> siv.type =
>> ? and siv.week between ? and ?', plan=
>> SELECT
>>     SIV._KEY,
>>     SIV._VAL,
>>     SIV.ID,
>>     SIV.CATEGORYID,
>>     SIV.GEOCHANNELID,
>>     SIV.GEOID,
>>     SIV.PRODUCTID,
>>     SIV.PPGID,
>>     SIV.TYPE,
>>     SIV.WEEK,
>>     C.ID,
>>     G.ID
>> FROM "PPSP-IMDG-CACHE".SIMULATIONINITIALVALUES SIV
>>     /* "PPSP-IMDG-CACHE"."type_idx": TYPE = ?3 */
>>     /* WHERE (SIV.TYPE = ?3)
>>         AND ((SIV.WEEK >= ?4)
>>         AND (SIV.WEEK <= ?5))
>>     */
>> INNER JOIN TABLE(ID BIGINT=?1) C
>>     /* function: ID = SIV.CATEGORYID
>>         AND ID = SIV.CATEGORYID
>>      */
>>     ON 1=1
>>     /* WHERE SIV.CATEGORYID = C.ID
>>     */
>> INNER JOIN TABLE(ID BIGINT=?2) G
>>     /* function: ID = SIV.GEOCHANNELID
>>         AND ID = SIV.GEOCHANNELID
>>      */
>>     ON 1=1
>> WHERE (SIV.CATEGORYID = C.ID)
>>     AND (((SIV.WEEK >= ?4)
>>     AND (SIV.WEEK <= ?5))
>>     AND ((SIV.TYPE = ?3)
>>     AND (SIV.GEOCHANNELID = G.ID)))
>> , parameters=[[Ljava.lang.Object;@7346a49b, [Ljava.lang.Object;@5458bfc9,
>> 2,
>> 1888, 1939]]
>>
>>
>>
>>
>>
>> --
>> View this message in context: http://apache-ignite-users.705
>> 18.x6.nabble.com/Ignite-SQL-Indexing-Performance-problems-tp12342.html
>> Sent from the Apache Ignite Users mailing list archive at Nabble.com.
>>
>
>
>
> --
> Best regards,
> Andrey V. Mashenkov
>

Re: Ignite SQL Indexing Performance problems

Posted by Andrey Mashenkov <an...@gmail.com>.
Hi,

As you use REPLICATED cache, the query will run on one node in single
thread. This should be fixed in 2.0
The only workaround for now is to convert cache to PARTITIONED, to make the
query to be run on several nodes, that will significantly increase
performance.

Also, you can try to add more field in index "PPSP-IMDG-CACHE"."type_idx".

On Tue, May 2, 2017 at 4:33 PM, kmandalas <
Kyriakos.Mandalas@iriworldwide.com> wrote:

> Hello,
>
> I have a cache in REPLICATED mode populated with the contents of a single
> DB
> table having ~2 million rows.
>
> I am running a simple SQL query like:
> select * from simulation_initial_values
>     where category_id in (1,2,3,4,5,6)
>         and geo_channel_id in (3,4,5,6)
>         and type_id=3
>         and week between 1888 and 1939;
>
> I have indexes at all the above fields and the queries are of course
> executed in local mode (since I have REPLICATED cache of read-only nature).
> The performance of the query is poor, I get warnings in the log and I see
> that no indexing is used.
>
> I saw the
> https://apacheignite.readme.io/docs/sql-performance-and-
> debugging#sql-performance-and-usability-considerations
> <https://apacheignite.readme.io/docs/sql-performance-and-
> debugging#sql-performance-and-usability-considerations>
> where it says that if a query contains IN operators  then this query will
> not use indexes. I followed the alternative approach proposed by replacing
> the IN clause with JOIN but still the same poor results. I paste log below:
>
> [14:36:53,479][WARN ][pub-#86%ppsp-cluster-IV%][IgniteH2Indexing] Query
> execution is too long [time=7133 ms, sql='select * from
> SimulationInitialValues siv  join table(id bigint = ?) c on siv.categoryId
> =
> c.id  join table(id bigint = ?) g on siv.geoChannelId = g.id  and
> siv.type =
> ? and siv.week between ? and ?', plan=
> SELECT
>     SIV._KEY,
>     SIV._VAL,
>     SIV.ID,
>     SIV.CATEGORYID,
>     SIV.GEOCHANNELID,
>     SIV.GEOID,
>     SIV.PRODUCTID,
>     SIV.PPGID,
>     SIV.TYPE,
>     SIV.WEEK,
>     C.ID,
>     G.ID
> FROM "PPSP-IMDG-CACHE".SIMULATIONINITIALVALUES SIV
>     /* "PPSP-IMDG-CACHE"."type_idx": TYPE = ?3 */
>     /* WHERE (SIV.TYPE = ?3)
>         AND ((SIV.WEEK >= ?4)
>         AND (SIV.WEEK <= ?5))
>     */
> INNER JOIN TABLE(ID BIGINT=?1) C
>     /* function: ID = SIV.CATEGORYID
>         AND ID = SIV.CATEGORYID
>      */
>     ON 1=1
>     /* WHERE SIV.CATEGORYID = C.ID
>     */
> INNER JOIN TABLE(ID BIGINT=?2) G
>     /* function: ID = SIV.GEOCHANNELID
>         AND ID = SIV.GEOCHANNELID
>      */
>     ON 1=1
> WHERE (SIV.CATEGORYID = C.ID)
>     AND (((SIV.WEEK >= ?4)
>     AND (SIV.WEEK <= ?5))
>     AND ((SIV.TYPE = ?3)
>     AND (SIV.GEOCHANNELID = G.ID)))
> , parameters=[[Ljava.lang.Object;@7346a49b, [Ljava.lang.Object;@5458bfc9,
> 2,
> 1888, 1939]]
>
>
>
>
>
> --
> View this message in context: http://apache-ignite-users.
> 70518.x6.nabble.com/Ignite-SQL-Indexing-Performance-problems-tp12342.html
> Sent from the Apache Ignite Users mailing list archive at Nabble.com.
>



-- 
Best regards,
Andrey V. Mashenkov