You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by yongjec <yo...@gmail.com> on 2018/11/28 22:45:19 UTC

Slow select distinct query on primary key

I am running below SQL query via Sqlline.sh, and I think it is running too
slow (57s). Could someone confirm whether this response time is normal, or I
am doing something wrong?

Here is the query:

0: jdbc:ignite:thin://127.0.0.1/> SELECT DISTINCT ACCOUNT_ID FROM
PERF_POSITIONS;
+--------------------------------+
|           ACCOUNT_ID           |
+--------------------------------+
| 1684                           |
| 1201                           |
| 1686                           |
...
...
| 1441                           |
+--------------------------------+
1,001 rows selected (57.453 seconds)


My setup is a single Azure VM (CentOS 7) with 16 Cores and 64GB RAM. The
host is idle other than the Ignite process.

My Dataset has 50 million rows with a total of 1001 distinct ACCOUNT_ID
values. Rows are almost evenly distributed among the account_id's. As you
can see in the below table definition, ACCOUNT_ID is first column of the
primary key and the index.

CREATE TABLE PERF_POSITIONS (
    ACCOUNT_ID VARCHAR(50) NOT NULL,
    EFFECTIVE_DATE DATE NOT NULL,
    FREQUENCY CHAR(1) NOT NULL,
    SOURCE_ID INTEGER NOT NULL,
    SECURITY_ALIAS BIGINT NOT NULL,
    POSITION_TYPE VARCHAR(10),
    IT VARCHAR(50),
    IN VARCHAR(255),
    PAI VARCHAR(100),
    TIC VARCHAR(100),
    GR DOUBLE,
    NR DOUBLE,
    GRL DOUBLE,
    IR DOUBLE,
    ABAL DOUBLE,
    BG DOUBLE,
    EG DOUBLE,
    CGD DOUBLE,
    CGC DOUBLE,
    CFG DOUBLE,
    GLG DOUBLE,
    IB DOUBLE,
    WT DOUBLE,
    BL DOUBLE,
    EL DOUBLE,
    CDL DOUBLE,
    CCL DOUBLE,
    CL DOUBLE,
    GLL DOUBLE,
    IBL DOUBLE,
    NC DOUBLE,
    BP DOUBLE,
    EP DOUBLE,
    CP DOUBLE,
    PR DOUBLE,
    BAI DOUBLE,
    EAI DOUBLE,
    CI DOUBLE,
    SF VARCHAR(10),
    US VARCHAR(255),
    UD DATE,
    PRIMARY KEY (ACCOUNT_ID, EFFECTIVE_DATE, FREQUENCY, SOURCE_ID,
SECURITY_ALIAS, POSITION_TYPE)
)
WITH "template=partitioned, backups=1, affinityKey=ACCOUNT_ID,
KEY_TYPE=ie.models.PerfPositionKey, VALUE_TYPE=ie.models.PerfPosition";
CREATE INDEX PERF_POSITIONS_IDX ON PERF_POSITIONS (ACCOUNT_ID,
EFFECTIVE_DATE, FREQUENCY, SOURCE_ID, SECURITY_ALIAS, POSITION_TYPE);


When I run the query, I see below warning showing up in the log.

[22:32:51,598][WARNING][client-connector-#136][IgniteH2Indexing] Query
execution is too long [time=57260 ms, sql='SELECT DISTINCT
__Z0.ACCOUNT_ID __C0_0
FROM PUBLIC.PERF_POSITIONS __Z0', plan=
SELECT DISTINCT
    __Z0.ACCOUNT_ID AS __C0_0
FROM PUBLIC.PERF_POSITIONS __Z0
    /* PUBLIC."_key_PK" */
, parameters=[]]






--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Slow select distinct query on primary key

Posted by Ilya Kasnacheev <il...@gmail.com>.
Hello!

> only 1 out of the 16 available cores get spiked to 100%, while the rest
remain idle

This is to be expected unless you crank query parallelism up:
https://apacheignite.readme.io/docs/sql-performance-and-debugging#query-parallelism

Unfortunately, the setting is only available via cache configuration/cache
template.

Regards,
-- 
Ilya Kasnacheev


пт, 30 нояб. 2018 г. в 17:43, yongjec <yo...@gmail.com>:

> Here is the explain plan.
>
> 0: jdbc:ignite:thin://127.0.0.1/> EXPLAIN SELECT DISTINCT ACCOUNT_ID FROM
> PERF_POSITIONS;
> 'PLAN'
> 'SELECT DISTINCT
>     __Z0.ACCOUNT_ID AS __C0_0
> FROM PUBLIC.PERF_POSITIONS __Z0
>     /* PUBLIC."_key_PK" */'
> 'SELECT DISTINCT
>     __C0_0 AS ACCOUNT_ID
> FROM PUBLIC.__T0
>     /* PUBLIC."merge_scan" */'
> 2 rows selected (0.026 seconds)
>
>
> Based on your suggestion, I tested below changes, but none of them made a
> difference. In all cases, the query took consistently 56-60 seconds.
>
>
> 1. Having the index with inline size 60.
>
> CREATE INDEX PERF_POSITIONS_IDX ON PERF_POSITIONS (ACCOUNT_ID) INLINE_SIZE
> 60;
>
>
> 2. Re-creating the table with VARCHAR size 4. (all the values in this
> particular dataset are 4 chars).
>
> CREATE TABLE PERF_POSITIONS (
>     ACCOUNT_ID VARCHAR(4) NOT NULL,
> ...
>
>
> 3. Using index hint.
>
> SELECT DISTINCT ACCOUNT_ID FROM PERF_POSITIONS USE
> INDEX(PERF_POSITIONS_IDX);
>
>
> On a side note, I noticed that while the query is running, only 1 out of
> the
> 16 available cores get spiked to 100%, while the rest remain idle. Not sure
> whether this is expected.
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: Slow select distinct query on primary key

Posted by yongjec <yo...@gmail.com>.
Here is the explain plan.

0: jdbc:ignite:thin://127.0.0.1/> EXPLAIN SELECT DISTINCT ACCOUNT_ID FROM
PERF_POSITIONS;
'PLAN'
'SELECT DISTINCT
    __Z0.ACCOUNT_ID AS __C0_0
FROM PUBLIC.PERF_POSITIONS __Z0
    /* PUBLIC."_key_PK" */'
'SELECT DISTINCT
    __C0_0 AS ACCOUNT_ID
FROM PUBLIC.__T0
    /* PUBLIC."merge_scan" */'
2 rows selected (0.026 seconds)


Based on your suggestion, I tested below changes, but none of them made a
difference. In all cases, the query took consistently 56-60 seconds.


1. Having the index with inline size 60.

CREATE INDEX PERF_POSITIONS_IDX ON PERF_POSITIONS (ACCOUNT_ID) INLINE_SIZE
60;


2. Re-creating the table with VARCHAR size 4. (all the values in this
particular dataset are 4 chars).

CREATE TABLE PERF_POSITIONS (
    ACCOUNT_ID VARCHAR(4) NOT NULL,
...


3. Using index hint.

SELECT DISTINCT ACCOUNT_ID FROM PERF_POSITIONS USE
INDEX(PERF_POSITIONS_IDX);


On a side note, I noticed that while the query is running, only 1 out of the
16 available cores get spiked to 100%, while the rest remain idle. Not sure
whether this is expected.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Slow select distinct query on primary key

Posted by Юрий <ju...@gmail.com>.
Andrew, 60 will be enough also :) It was just quick calculated value with
rounding.

Real inline size for the case is 53: 1 /* byte,  type code */ + 2 /* short,
length of array */ 50 /* data size for ANSI chars */

пт, 30 нояб. 2018 г. в 14:09, Andrey Mashenkov <an...@gmail.com>:

> Yuri, how did you get inline size 60?
> I'd think 55 should be enough to inline Account_ID. 55 = 1 /* byte, type
> code */ + 4 /* int, array lenght */ + 50 /* data size for ANSI chars */
>
> On Fri, Nov 30, 2018 at 1:25 PM Юрий <ju...@gmail.com> wrote:
>
>> Please provide explain plan of the query to check that index is use. *EXPLAIN
>> {your select statement}*
>>
>> Also I noticed ACCOUNT_ID have length 50. Need to increase of inline
>> index size for the index.
>>
>> Try create index with the follow command *CREATE INDEX
>> PERF_POSITIONS_IDX ON PERF_POSITIONS (ACCOUNT_ID) INLINE_SIZE 60;*
>>
>> чт, 29 нояб. 2018 г. в 16:47, yongjec <yo...@gmail.com>:
>>
>>> Hi,
>>>
>>> I tried the additional index as you suggested, but it did not improve the
>>> query time. The query still takes 58-61 seconds.
>>>
>>> CREATE INDEX PERF_POSITIONS_IDX ON PERF_POSITIONS (ACCOUNT_ID);
>>> CREATE INDEX PERF_POSITIONS_IDX2 ON PERF_POSITIONS (ACCOUNT_ID,
>>> EFFECTIVE_DATE, FREQUENCY, SOURCE_ID, SECURITY_ALIAS, POSITION_TYPE);
>>>
>>>
>>> I also tried the single column index only without the composite index.
>>> That
>>> did not make any difference in query time, either.
>>>
>>> CREATE INDEX PERF_POSITIONS_IDX ON PERF_POSITIONS (ACCOUNT_ID);
>>>
>>>
>>>
>>> --
>>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>>>
>>
>>
>> --
>> Живи с улыбкой! :D
>>
>
>
> --
> Best regards,
> Andrey V. Mashenkov
>


-- 
Живи с улыбкой! :D

Re: Slow select distinct query on primary key

Posted by Andrey Mashenkov <an...@gmail.com>.
Yuri, how did you get inline size 60?
I'd think 55 should be enough to inline Account_ID. 55 = 1 /* byte, type
code */ + 4 /* int, array lenght */ + 50 /* data size for ANSI chars */

On Fri, Nov 30, 2018 at 1:25 PM Юрий <ju...@gmail.com> wrote:

> Please provide explain plan of the query to check that index is use. *EXPLAIN
> {your select statement}*
>
> Also I noticed ACCOUNT_ID have length 50. Need to increase of inline index
> size for the index.
>
> Try create index with the follow command *CREATE INDEX PERF_POSITIONS_IDX
> ON PERF_POSITIONS (ACCOUNT_ID) INLINE_SIZE 60;*
>
> чт, 29 нояб. 2018 г. в 16:47, yongjec <yo...@gmail.com>:
>
>> Hi,
>>
>> I tried the additional index as you suggested, but it did not improve the
>> query time. The query still takes 58-61 seconds.
>>
>> CREATE INDEX PERF_POSITIONS_IDX ON PERF_POSITIONS (ACCOUNT_ID);
>> CREATE INDEX PERF_POSITIONS_IDX2 ON PERF_POSITIONS (ACCOUNT_ID,
>> EFFECTIVE_DATE, FREQUENCY, SOURCE_ID, SECURITY_ALIAS, POSITION_TYPE);
>>
>>
>> I also tried the single column index only without the composite index.
>> That
>> did not make any difference in query time, either.
>>
>> CREATE INDEX PERF_POSITIONS_IDX ON PERF_POSITIONS (ACCOUNT_ID);
>>
>>
>>
>> --
>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>>
>
>
> --
> Живи с улыбкой! :D
>


-- 
Best regards,
Andrey V. Mashenkov

Re: Slow select distinct query on primary key

Posted by Юрий <ju...@gmail.com>.
Please provide explain plan of the query to check that index is use. *EXPLAIN
{your select statement}*

Also I noticed ACCOUNT_ID have length 50. Need to increase of inline index
size for the index.

Try create index with the follow command *CREATE INDEX PERF_POSITIONS_IDX
ON PERF_POSITIONS (ACCOUNT_ID) INLINE_SIZE 60;*

чт, 29 нояб. 2018 г. в 16:47, yongjec <yo...@gmail.com>:

> Hi,
>
> I tried the additional index as you suggested, but it did not improve the
> query time. The query still takes 58-61 seconds.
>
> CREATE INDEX PERF_POSITIONS_IDX ON PERF_POSITIONS (ACCOUNT_ID);
> CREATE INDEX PERF_POSITIONS_IDX2 ON PERF_POSITIONS (ACCOUNT_ID,
> EFFECTIVE_DATE, FREQUENCY, SOURCE_ID, SECURITY_ALIAS, POSITION_TYPE);
>
>
> I also tried the single column index only without the composite index. That
> did not make any difference in query time, either.
>
> CREATE INDEX PERF_POSITIONS_IDX ON PERF_POSITIONS (ACCOUNT_ID);
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>


-- 
Живи с улыбкой! :D

Re: Slow select distinct query on primary key

Posted by yongjec <yo...@gmail.com>.
Hi,

I tried the additional index as you suggested, but it did not improve the
query time. The query still takes 58-61 seconds.

CREATE INDEX PERF_POSITIONS_IDX ON PERF_POSITIONS (ACCOUNT_ID);
CREATE INDEX PERF_POSITIONS_IDX2 ON PERF_POSITIONS (ACCOUNT_ID,
EFFECTIVE_DATE, FREQUENCY, SOURCE_ID, SECURITY_ALIAS, POSITION_TYPE);


I also tried the single column index only without the composite index. That
did not make any difference in query time, either.

CREATE INDEX PERF_POSITIONS_IDX ON PERF_POSITIONS (ACCOUNT_ID);



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Slow select distinct query on primary key

Posted by Юрий <ju...@gmail.com>.
Hi,

There is some issue to use indexes for complex PK, which is already fixed
and will be included into 2.8.
https://issues.apache.org/jira/browse/IGNITE-8386
As of now to fix your performance issue you can create separate index for
ACCOUNT_ID column.


чт, 29 нояб. 2018 г. в 02:24, yongjec <yo...@gmail.com>:

> Here is my Ignite server configuration.
>
> <beans xmlns="http://www.springframework.org/schema/beans"
>        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
> xsi:schemaLocation="
>        http://www.springframework.org/schema/beans
>        http://www.springframework.org/schema/beans/spring-beans.xsd">
>
>     <bean id="grid.cfg"
> class="org.apache.ignite.configuration.IgniteConfiguration">
>
>
>         <property name="dataStorageConfiguration">
>             <bean
> class="org.apache.ignite.configuration.DataStorageConfiguration">
>
>                 <property name="defaultDataRegionConfiguration">
>                     <bean
> class="org.apache.ignite.configuration.DataRegionConfiguration">
>                         <property name="name" value="Default_Region"/>
>
>                         <property name="maxSize" value="#{48L * 1024 * 1024
> * 1024}"/>
>
>                         <property name="metricsEnabled" value="true"/>
>                     </bean>
>                 </property>
>             </bean>
>         </property>
>
>
>         <property name="peerClassLoadingEnabled" value="true"/>
>
>         <property name="discoverySpi">
>             <bean
> class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
>                 <property name="ipFinder">
>                     <bean
>
> class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">
>                         <property name="addresses">
>                             <list>
>                                 <value>127.0.0.1:47500..47509</value>
>                             </list>
>                         </property>
>                     </bean>
>                 </property>
>             </bean>
>         </property>
>
>         <property name="communicationSpi">
>             <bean
> class="org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi">
>                 <property name="messageQueueLimit" value="256"/>
>             </bean>
>         </property>
>
>
>     </bean>
> </beans>
>
>
> And here are my JVM heap flags.
>
> ignite.sh -J-Xms8g -J-Xmx16g
> /home/ansible/ignite/apache-ignite-fabric-2.6.0-bin/config/poc1.xml
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>


-- 
Живи с улыбкой! :D

Re: Slow select distinct query on primary key

Posted by yongjec <yo...@gmail.com>.
Here is my Ignite server configuration.

<beans xmlns="http://www.springframework.org/schema/beans"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="
       http://www.springframework.org/schema/beans
       http://www.springframework.org/schema/beans/spring-beans.xsd">
    
    <bean id="grid.cfg"
class="org.apache.ignite.configuration.IgniteConfiguration">

        
        <property name="dataStorageConfiguration">
            <bean
class="org.apache.ignite.configuration.DataStorageConfiguration">
                
                <property name="defaultDataRegionConfiguration">
                    <bean
class="org.apache.ignite.configuration.DataRegionConfiguration">
                        <property name="name" value="Default_Region"/>
                        
                        <property name="maxSize" value="#{48L * 1024 * 1024
* 1024}"/>
                        
                        <property name="metricsEnabled" value="true"/>
                    </bean>
                </property>
            </bean>
        </property>

        
        <property name="peerClassLoadingEnabled" value="true"/>

        <property name="discoverySpi">
            <bean
class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
                <property name="ipFinder">
                    <bean
class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">
                        <property name="addresses">
                            <list>
                                <value>127.0.0.1:47500..47509</value>
                            </list>
                        </property>
                    </bean>
                </property>
            </bean>
        </property>

        <property name="communicationSpi">
            <bean
class="org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi">
                <property name="messageQueueLimit" value="256"/>
            </bean>
        </property>

        
    </bean>
</beans>


And here are my JVM heap flags.

ignite.sh -J-Xms8g -J-Xmx16g
/home/ansible/ignite/apache-ignite-fabric-2.6.0-bin/config/poc1.xml




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/