You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@ignite.apache.org by Guillermo Ortiz <ko...@gmail.com> on 2017/05/07 22:47:47 UTC

GroupBy with index is really slow.

I have a simple cache where I have Person object (id, age and so on..)

I have created indeces by id and age to try some examples but queries with
"group by" go really slow.

I'm trying this query:
SELECT age
FROM
PersonWithindex
group by age


SELECT
    AGE AS __C0
FROM "personCacheWithIndex".PERSONWITHINDEX
    /* "personCacheWithIndex"."age_idx" */
GROUP BY AGE
/* group sorted */

SELECT
    __C0 AS AGE
FROM PUBLIC.__T0
    /* "personCacheWithIndex"."merge_scan" */
GROUP BY __C0

Although it seems that it uses index, why is it going so slow? I think that
it should be pretty fast with an index.

Re: GroupBy with index is really slow.

Posted by afedotov <al...@gmail.com>.

Hi.

I think the problem is that age and month have bad selectivity. For
example, in the case of month, each month index will select in average
2000000 / 12 ~ 1666667 rows.
In this case, it's probably faster scan all the rows in storage/disk order
then using the index order.

Kind regards,
Alex.

On Wed, May 10, 2017 at 6:29 PM, Guillermo Ortiz [via Apache Ignite Users] <
ml+s70518n12599h3@n6.nabble.com> wrote:

> Are indices used always that it'spossible or it depends how many records
> data you have to read as Oracle? So, if You use and index but the result
> it's to read 25% of the data, usually indices are not used because
> sequencial read are faster. As this is in memory I understand that it
> doesn't apply but I'm not sure and I didn't read anything about it in the
> documentation.
>
> 2017-05-10 17:27 GMT+02:00 Guillermo Ortiz <[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=12599&i=0>>:
>
>> If I execute many times, time is similar in both.
>>
>> 2017-05-10 17:25 GMT+02:00 Guillermo Ortiz <[hidden email]
>> <http:///user/SendEmail.jtp?type=node&node=12599&i=1>>:
>>
>>> I'm using 2.0.0.
>>> Right now in my laptop with 2Mill objects.
>>>
>>>
>>> *WITH INDICES*
>>> ******Result SELECT distinct(MONTH) FROM PERSONWITHINDEX
>>> September,February,March,August,June,April,July,January,May,
>>> November,October,
>>> Elapsed time SELECT distinct(MONTH) FROM PERSONWITHINDEX:1990ms
>>> [SELECT DISTINCT
>>>     __Z0.MONTH AS __C0_0
>>> FROM "PersonWithIndex".PERSONWITHINDEX __Z0
>>>     /* "PersonWithIndex".PERSONWITHINDEX_MONTH_IDX */]
>>> [SELECT DISTINCT
>>>     __C0_0 AS MONTH
>>> FROM PUBLIC.__T0
>>>     /* "PersonWithIndex"."merge_scan" */]
>>>
>>>
>>> *WITHOUT INDICES*
>>> ******Result SELECT distinct(MONTH) FROM PERSONWITHOUTINDEX
>>> September,November,February,March,May,April,July,August,Janu
>>> ary,June,October,
>>> Elapsed time SELECT distinct(MONTH) FROM PERSONWITHOUTINDEX:1690ms
>>> [SELECT DISTINCT
>>>     __Z0.MONTH AS __C0_0
>>> FROM "PersonWithoutIndex".PERSONWITHOUTINDEX __Z0
>>>     /* "PersonWithoutIndex".PERSONWITHOUTINDEX.__SCAN_ */]
>>> [SELECT DISTINCT
>>>     __C0_0 AS MONTH
>>> FROM PUBLIC.__T0
>>>     /* "PersonWithoutIndex"."merge_scan" */]
>>> ***************************
>>>
>>> public class PersonWithIndex implements Serializable {
>>>     @QuerySqlField(index = true)
>>>     public int id;
>>>     @QuerySqlField(index = true, orderedGroups = {@QuerySqlField.Group(name="index_group_month_age", order=1)})
>>>     public int age;
>>>     @QuerySqlField
>>>     public String dni;
>>>     @QuerySqlField(index = true, orderedGroups = {@QuerySqlField.Group(name="index_group_month_age", order=0)})
>>>     public String month;
>>>
>>>
>>> 2017-05-10 17:06 GMT+02:00 afedotov <[hidden email]
>>> <http:///user/SendEmail.jtp?type=node&node=12599&i=2>>:
>>>
>>>> Hi,
>>>>
>>>> Which Ignite version do you use?
>>>> How many Ignite nodes do you run?
>>>> What is your cache configuration?
>>>>
>>>> As well, could you please try the following query and see the
>>>> performance?
>>>> SELECT DISTINCT age FROM PersonWithindex;
>>>>
>>>>
>>>> Kind regards,
>>>> Alex.
>>>>
>>>> On Wed, May 10, 2017 at 5:57 PM, Guillermo Ortiz [via Apache Ignite
>>>> Users] <[hidden email]
>>>> <http:///user/SendEmail.jtp?type=node&node=12593&i=0>> wrote:
>>>>
>>>>> It's my laptop. 16gb, i7.. The collection size it's 5Mill objects
>>>>> (1,2GB) and it takes about 30sec.
>>>>> I have tried to execute the query with the same collection with and
>>>>> without indices with the same time results.
>>>>>
>>>>> This is a test before to run in the real cluster with 6 nodes of 512Gb
>>>>> and 48cores each one. We tried to execute there with a real dataset(22M
>>>>> rows, 7gb) but it takes so long that query is aborted.
>>>>>
>>>>>
>>>>> 2017-05-08 12:12 GMT+02:00 Andrey Mashenkov <[hidden email]
>>>>> <http:///user/SendEmail.jtp?type=node&node=12592&i=0>>:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Why do you think query is slow?
>>>>>> What its execution time? What is expected time?
>>>>>> How many nodes do you use? How much data does cache contains?
>>>>>>
>>>>>> On Mon, May 8, 2017 at 1:47 AM, Guillermo Ortiz <[hidden email]
>>>>>> <http:///user/SendEmail.jtp?type=node&node=12592&i=1>> wrote:
>>>>>>
>>>>>>> I have a simple cache where I have Person object (id, age and so
>>>>>>> on..)
>>>>>>>
>>>>>>> I have created indeces by id and age to try some examples but
>>>>>>> queries with "group by" go really slow.
>>>>>>>
>>>>>>> I'm trying this query:
>>>>>>> SELECT age
>>>>>>> FROM
>>>>>>> PersonWithindex
>>>>>>> group by age
>>>>>>>
>>>>>>>
>>>>>>> SELECT
>>>>>>>     AGE AS __C0
>>>>>>> FROM "personCacheWithIndex".PERSONWITHINDEX
>>>>>>>     /* "personCacheWithIndex"."age_idx" */
>>>>>>> GROUP BY AGE
>>>>>>> /* group sorted */
>>>>>>>
>>>>>>> SELECT
>>>>>>>     __C0 AS AGE
>>>>>>> FROM PUBLIC.__T0
>>>>>>>     /* "personCacheWithIndex"."merge_scan" */
>>>>>>> GROUP BY __C0
>>>>>>>
>>>>>>> Although it seems that it uses index, why is it going so slow? I
>>>>>>> think that it should be pretty fast with an index.
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best regards,
>>>>>> Andrey V. Mashenkov
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ------------------------------
>>>>> If you reply to this email, your message will be added to the
>>>>> discussion below:
>>>>> http://apache-ignite-users.70518.x6.nabble.com/GroupBy-with-
>>>>> index-is-really-slow-tp12526p12592.html
>>>>> To start a new topic under Apache Ignite Users, email [hidden email]
>>>>> <http:///user/SendEmail.jtp?type=node&node=12593&i=1>
>>>>> To unsubscribe from Apache Ignite Users, click here.
>>>>> NAML
>>>>> <http://apache-ignite-users.70518.x6.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>>>>>
>>>>
>>>>
>>>> ------------------------------
>>>> View this message in context: Re: GroupBy with index is really slow.
>>>> <http://apache-ignite-users.70518.x6.nabble.com/GroupBy-with-index-is-really-slow-tp12526p12593.html>
>>>> Sent from the Apache Ignite Users mailing list archive
>>>> <http://apache-ignite-users.70518.x6.nabble.com/> at Nabble.com.
>>>>
>>>
>>>
>>
>
>
> ------------------------------
> If you reply to this email, your message will be added to the discussion
> below:
> http://apache-ignite-users.70518.x6.nabble.com/GroupBy-
> with-index-is-really-slow-tp12526p12599.html
> To start a new topic under Apache Ignite Users, email
> ml+s70518n1h65@n6.nabble.com
> To unsubscribe from Apache Ignite Users, click here
> <http://apache-ignite-users.70518.x6.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=YWxleGFuZGVyLmZlZG90b2ZmQGdtYWlsLmNvbXwxfC0xMzYxNTU0NTg=>
> .
> NAML
> <http://apache-ignite-users.70518.x6.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>




--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/GroupBy-with-index-is-really-slow-tp12526p12602.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: GroupBy with index is really slow.

Posted by Guillermo Ortiz <ko...@gmail.com>.

Are indices used always that it'spossible or it depends how many records
data you have to read as Oracle? So, if You use and index but the result
it's to read 25% of the data, usually indices are not used because
sequencial read are faster. As this is in memory I understand that it
doesn't apply but I'm not sure and I didn't read anything about it in the
documentation.

2017-05-10 17:27 GMT+02:00 Guillermo Ortiz <ko...@gmail.com>:

> If I execute many times, time is similar in both.
>
> 2017-05-10 17:25 GMT+02:00 Guillermo Ortiz <ko...@gmail.com>:
>
>> I'm using 2.0.0.
>> Right now in my laptop with 2Mill objects.
>>
>>
>> *WITH INDICES*
>> ******Result SELECT distinct(MONTH) FROM PERSONWITHINDEX
>> September,February,March,August,June,April,July,January,May,
>> November,October,
>> Elapsed time SELECT distinct(MONTH) FROM PERSONWITHINDEX:1990ms
>> [SELECT DISTINCT
>>     __Z0.MONTH AS __C0_0
>> FROM "PersonWithIndex".PERSONWITHINDEX __Z0
>>     /* "PersonWithIndex".PERSONWITHINDEX_MONTH_IDX */]
>> [SELECT DISTINCT
>>     __C0_0 AS MONTH
>> FROM PUBLIC.__T0
>>     /* "PersonWithIndex"."merge_scan" */]
>>
>>
>> *WITHOUT INDICES*
>> ******Result SELECT distinct(MONTH) FROM PERSONWITHOUTINDEX
>> September,November,February,March,May,April,July,August,Janu
>> ary,June,October,
>> Elapsed time SELECT distinct(MONTH) FROM PERSONWITHOUTINDEX:1690ms
>> [SELECT DISTINCT
>>     __Z0.MONTH AS __C0_0
>> FROM "PersonWithoutIndex".PERSONWITHOUTINDEX __Z0
>>     /* "PersonWithoutIndex".PERSONWITHOUTINDEX.__SCAN_ */]
>> [SELECT DISTINCT
>>     __C0_0 AS MONTH
>> FROM PUBLIC.__T0
>>     /* "PersonWithoutIndex"."merge_scan" */]
>> ***************************
>>
>> public class PersonWithIndex implements Serializable {
>>     @QuerySqlField(index = true)
>>     public int id;
>>     @QuerySqlField(index = true, orderedGroups = {@QuerySqlField.Group(name="index_group_month_age", order=1)})
>>     public int age;
>>     @QuerySqlField
>>     public String dni;
>>     @QuerySqlField(index = true, orderedGroups = {@QuerySqlField.Group(name="index_group_month_age", order=0)})
>>     public String month;
>>
>>
>> 2017-05-10 17:06 GMT+02:00 afedotov <al...@gmail.com>:
>>
>>> Hi,
>>>
>>> Which Ignite version do you use?
>>> How many Ignite nodes do you run?
>>> What is your cache configuration?
>>>
>>> As well, could you please try the following query and see the
>>> performance?
>>> SELECT DISTINCT age FROM PersonWithindex;
>>>
>>>
>>> Kind regards,
>>> Alex.
>>>
>>> On Wed, May 10, 2017 at 5:57 PM, Guillermo Ortiz [via Apache Ignite
>>> Users] <[hidden email]
>>> <http:///user/SendEmail.jtp?type=node&node=12593&i=0>> wrote:
>>>
>>>> It's my laptop. 16gb, i7.. The collection size it's 5Mill objects
>>>> (1,2GB) and it takes about 30sec.
>>>> I have tried to execute the query with the same collection with and
>>>> without indices with the same time results.
>>>>
>>>> This is a test before to run in the real cluster with 6 nodes of 512Gb
>>>> and 48cores each one. We tried to execute there with a real dataset(22M
>>>> rows, 7gb) but it takes so long that query is aborted.
>>>>
>>>>
>>>> 2017-05-08 12:12 GMT+02:00 Andrey Mashenkov <[hidden email]
>>>> <http:///user/SendEmail.jtp?type=node&node=12592&i=0>>:
>>>>
>>>>> Hi,
>>>>>
>>>>> Why do you think query is slow?
>>>>> What its execution time? What is expected time?
>>>>> How many nodes do you use? How much data does cache contains?
>>>>>
>>>>> On Mon, May 8, 2017 at 1:47 AM, Guillermo Ortiz <[hidden email]
>>>>> <http:///user/SendEmail.jtp?type=node&node=12592&i=1>> wrote:
>>>>>
>>>>>> I have a simple cache where I have Person object (id, age and so on..)
>>>>>>
>>>>>> I have created indeces by id and age to try some examples but queries
>>>>>> with "group by" go really slow.
>>>>>>
>>>>>> I'm trying this query:
>>>>>> SELECT age
>>>>>> FROM
>>>>>> PersonWithindex
>>>>>> group by age
>>>>>>
>>>>>>
>>>>>> SELECT
>>>>>>     AGE AS __C0
>>>>>> FROM "personCacheWithIndex".PERSONWITHINDEX
>>>>>>     /* "personCacheWithIndex"."age_idx" */
>>>>>> GROUP BY AGE
>>>>>> /* group sorted */
>>>>>>
>>>>>> SELECT
>>>>>>     __C0 AS AGE
>>>>>> FROM PUBLIC.__T0
>>>>>>     /* "personCacheWithIndex"."merge_scan" */
>>>>>> GROUP BY __C0
>>>>>>
>>>>>> Although it seems that it uses index, why is it going so slow? I
>>>>>> think that it should be pretty fast with an index.
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best regards,
>>>>> Andrey V. Mashenkov
>>>>>
>>>>
>>>>
>>>>
>>>> ------------------------------
>>>> If you reply to this email, your message will be added to the
>>>> discussion below:
>>>> http://apache-ignite-users.70518.x6.nabble.com/GroupBy-with-
>>>> index-is-really-slow-tp12526p12592.html
>>>> To start a new topic under Apache Ignite Users, email [hidden email]
>>>> <http:///user/SendEmail.jtp?type=node&node=12593&i=1>
>>>> To unsubscribe from Apache Ignite Users, click here.
>>>> NAML
>>>> <http://apache-ignite-users.70518.x6.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>>>>
>>>
>>>
>>> ------------------------------
>>> View this message in context: Re: GroupBy with index is really slow.
>>> <http://apache-ignite-users.70518.x6.nabble.com/GroupBy-with-index-is-really-slow-tp12526p12593.html>
>>> Sent from the Apache Ignite Users mailing list archive
>>> <http://apache-ignite-users.70518.x6.nabble.com/> at Nabble.com.
>>>
>>
>>
>

Re: GroupBy with index is really slow.

Posted by Guillermo Ortiz <ko...@gmail.com>.

If I execute many times, time is similar in both.

2017-05-10 17:25 GMT+02:00 Guillermo Ortiz <ko...@gmail.com>:

> I'm using 2.0.0.
> Right now in my laptop with 2Mill objects.
>
>
> *WITH INDICES*
> ******Result SELECT distinct(MONTH) FROM PERSONWITHINDEX
> September,February,March,August,June,April,July,
> January,May,November,October,
> Elapsed time SELECT distinct(MONTH) FROM PERSONWITHINDEX:1990ms
> [SELECT DISTINCT
>     __Z0.MONTH AS __C0_0
> FROM "PersonWithIndex".PERSONWITHINDEX __Z0
>     /* "PersonWithIndex".PERSONWITHINDEX_MONTH_IDX */]
> [SELECT DISTINCT
>     __C0_0 AS MONTH
> FROM PUBLIC.__T0
>     /* "PersonWithIndex"."merge_scan" */]
>
>
> *WITHOUT INDICES*
> ******Result SELECT distinct(MONTH) FROM PERSONWITHOUTINDEX
> September,November,February,March,May,April,July,August,
> January,June,October,
> Elapsed time SELECT distinct(MONTH) FROM PERSONWITHOUTINDEX:1690ms
> [SELECT DISTINCT
>     __Z0.MONTH AS __C0_0
> FROM "PersonWithoutIndex".PERSONWITHOUTINDEX __Z0
>     /* "PersonWithoutIndex".PERSONWITHOUTINDEX.__SCAN_ */]
> [SELECT DISTINCT
>     __C0_0 AS MONTH
> FROM PUBLIC.__T0
>     /* "PersonWithoutIndex"."merge_scan" */]
> ***************************
>
> public class PersonWithIndex implements Serializable {
>     @QuerySqlField(index = true)
>     public int id;
>     @QuerySqlField(index = true, orderedGroups = {@QuerySqlField.Group(name="index_group_month_age", order=1)})
>     public int age;
>     @QuerySqlField
>     public String dni;
>     @QuerySqlField(index = true, orderedGroups = {@QuerySqlField.Group(name="index_group_month_age", order=0)})
>     public String month;
>
>
> 2017-05-10 17:06 GMT+02:00 afedotov <al...@gmail.com>:
>
>> Hi,
>>
>> Which Ignite version do you use?
>> How many Ignite nodes do you run?
>> What is your cache configuration?
>>
>> As well, could you please try the following query and see the performance?
>> SELECT DISTINCT age FROM PersonWithindex;
>>
>>
>> Kind regards,
>> Alex.
>>
>> On Wed, May 10, 2017 at 5:57 PM, Guillermo Ortiz [via Apache Ignite
>> Users] <[hidden email]
>> <http:///user/SendEmail.jtp?type=node&node=12593&i=0>> wrote:
>>
>>> It's my laptop. 16gb, i7.. The collection size it's 5Mill objects
>>> (1,2GB) and it takes about 30sec.
>>> I have tried to execute the query with the same collection with and
>>> without indices with the same time results.
>>>
>>> This is a test before to run in the real cluster with 6 nodes of 512Gb
>>> and 48cores each one. We tried to execute there with a real dataset(22M
>>> rows, 7gb) but it takes so long that query is aborted.
>>>
>>>
>>> 2017-05-08 12:12 GMT+02:00 Andrey Mashenkov <[hidden email]
>>> <http:///user/SendEmail.jtp?type=node&node=12592&i=0>>:
>>>
>>>> Hi,
>>>>
>>>> Why do you think query is slow?
>>>> What its execution time? What is expected time?
>>>> How many nodes do you use? How much data does cache contains?
>>>>
>>>> On Mon, May 8, 2017 at 1:47 AM, Guillermo Ortiz <[hidden email]
>>>> <http:///user/SendEmail.jtp?type=node&node=12592&i=1>> wrote:
>>>>
>>>>> I have a simple cache where I have Person object (id, age and so on..)
>>>>>
>>>>> I have created indeces by id and age to try some examples but queries
>>>>> with "group by" go really slow.
>>>>>
>>>>> I'm trying this query:
>>>>> SELECT age
>>>>> FROM
>>>>> PersonWithindex
>>>>> group by age
>>>>>
>>>>>
>>>>> SELECT
>>>>>     AGE AS __C0
>>>>> FROM "personCacheWithIndex".PERSONWITHINDEX
>>>>>     /* "personCacheWithIndex"."age_idx" */
>>>>> GROUP BY AGE
>>>>> /* group sorted */
>>>>>
>>>>> SELECT
>>>>>     __C0 AS AGE
>>>>> FROM PUBLIC.__T0
>>>>>     /* "personCacheWithIndex"."merge_scan" */
>>>>> GROUP BY __C0
>>>>>
>>>>> Although it seems that it uses index, why is it going so slow? I think
>>>>> that it should be pretty fast with an index.
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Best regards,
>>>> Andrey V. Mashenkov
>>>>
>>>
>>>
>>>
>>> ------------------------------
>>> If you reply to this email, your message will be added to the discussion
>>> below:
>>> http://apache-ignite-users.70518.x6.nabble.com/GroupBy-with-
>>> index-is-really-slow-tp12526p12592.html
>>> To start a new topic under Apache Ignite Users, email [hidden email]
>>> <http:///user/SendEmail.jtp?type=node&node=12593&i=1>
>>> To unsubscribe from Apache Ignite Users, click here.
>>> NAML
>>> <http://apache-ignite-users.70518.x6.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>>>
>>
>>
>> ------------------------------
>> View this message in context: Re: GroupBy with index is really slow.
>> <http://apache-ignite-users.70518.x6.nabble.com/GroupBy-with-index-is-really-slow-tp12526p12593.html>
>> Sent from the Apache Ignite Users mailing list archive
>> <http://apache-ignite-users.70518.x6.nabble.com/> at Nabble.com.
>>
>
>

Re: GroupBy with index is really slow.

Posted by Guillermo Ortiz <ko...@gmail.com>.

I'm using 2.0.0.
Right now in my laptop with 2Mill objects.


*WITH INDICES*
******Result SELECT distinct(MONTH) FROM PERSONWITHINDEX
September,February,March,August,June,April,July,January,May,November,October,
Elapsed time SELECT distinct(MONTH) FROM PERSONWITHINDEX:1990ms
[SELECT DISTINCT
    __Z0.MONTH AS __C0_0
FROM "PersonWithIndex".PERSONWITHINDEX __Z0
    /* "PersonWithIndex".PERSONWITHINDEX_MONTH_IDX */]
[SELECT DISTINCT
    __C0_0 AS MONTH
FROM PUBLIC.__T0
    /* "PersonWithIndex"."merge_scan" */]


*WITHOUT INDICES*
******Result SELECT distinct(MONTH) FROM PERSONWITHOUTINDEX
September,November,February,March,May,April,July,August,January,June,October,
Elapsed time SELECT distinct(MONTH) FROM PERSONWITHOUTINDEX:1690ms
[SELECT DISTINCT
    __Z0.MONTH AS __C0_0
FROM "PersonWithoutIndex".PERSONWITHOUTINDEX __Z0
    /* "PersonWithoutIndex".PERSONWITHOUTINDEX.__SCAN_ */]
[SELECT DISTINCT
    __C0_0 AS MONTH
FROM PUBLIC.__T0
    /* "PersonWithoutIndex"."merge_scan" */]
***************************

public class PersonWithIndex implements Serializable {
    @QuerySqlField(index = true)
    public int id;
    @QuerySqlField(index = true, orderedGroups =
{@QuerySqlField.Group(name="index_group_month_age", order=1)})
    public int age;
    @QuerySqlField
    public String dni;
    @QuerySqlField(index = true, orderedGroups =
{@QuerySqlField.Group(name="index_group_month_age", order=0)})
    public String month;


2017-05-10 17:06 GMT+02:00 afedotov <al...@gmail.com>:

> Hi,
>
> Which Ignite version do you use?
> How many Ignite nodes do you run?
> What is your cache configuration?
>
> As well, could you please try the following query and see the performance?
> SELECT DISTINCT age FROM PersonWithindex;
>
>
> Kind regards,
> Alex.
>
> On Wed, May 10, 2017 at 5:57 PM, Guillermo Ortiz [via Apache Ignite Users]
> <[hidden email] <http:///user/SendEmail.jtp?type=node&node=12593&i=0>>
> wrote:
>
>> It's my laptop. 16gb, i7.. The collection size it's 5Mill objects (1,2GB)
>> and it takes about 30sec.
>> I have tried to execute the query with the same collection with and
>> without indices with the same time results.
>>
>> This is a test before to run in the real cluster with 6 nodes of 512Gb
>> and 48cores each one. We tried to execute there with a real dataset(22M
>> rows, 7gb) but it takes so long that query is aborted.
>>
>>
>> 2017-05-08 12:12 GMT+02:00 Andrey Mashenkov <[hidden email]
>> <http:///user/SendEmail.jtp?type=node&node=12592&i=0>>:
>>
>>> Hi,
>>>
>>> Why do you think query is slow?
>>> What its execution time? What is expected time?
>>> How many nodes do you use? How much data does cache contains?
>>>
>>> On Mon, May 8, 2017 at 1:47 AM, Guillermo Ortiz <[hidden email]
>>> <http:///user/SendEmail.jtp?type=node&node=12592&i=1>> wrote:
>>>
>>>> I have a simple cache where I have Person object (id, age and so on..)
>>>>
>>>> I have created indeces by id and age to try some examples but queries
>>>> with "group by" go really slow.
>>>>
>>>> I'm trying this query:
>>>> SELECT age
>>>> FROM
>>>> PersonWithindex
>>>> group by age
>>>>
>>>>
>>>> SELECT
>>>>     AGE AS __C0
>>>> FROM "personCacheWithIndex".PERSONWITHINDEX
>>>>     /* "personCacheWithIndex"."age_idx" */
>>>> GROUP BY AGE
>>>> /* group sorted */
>>>>
>>>> SELECT
>>>>     __C0 AS AGE
>>>> FROM PUBLIC.__T0
>>>>     /* "personCacheWithIndex"."merge_scan" */
>>>> GROUP BY __C0
>>>>
>>>> Although it seems that it uses index, why is it going so slow? I think
>>>> that it should be pretty fast with an index.
>>>>
>>>
>>>
>>>
>>> --
>>> Best regards,
>>> Andrey V. Mashenkov
>>>
>>
>>
>>
>> ------------------------------
>> If you reply to this email, your message will be added to the discussion
>> below:
>> http://apache-ignite-users.70518.x6.nabble.com/GroupBy-with-
>> index-is-really-slow-tp12526p12592.html
>> To start a new topic under Apache Ignite Users, email [hidden email]
>> <http:///user/SendEmail.jtp?type=node&node=12593&i=1>
>> To unsubscribe from Apache Ignite Users, click here.
>> NAML
>> <http://apache-ignite-users.70518.x6.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>>
>
>
> ------------------------------
> View this message in context: Re: GroupBy with index is really slow.
> <http://apache-ignite-users.70518.x6.nabble.com/GroupBy-with-index-is-really-slow-tp12526p12593.html>
> Sent from the Apache Ignite Users mailing list archive
> <http://apache-ignite-users.70518.x6.nabble.com/> at Nabble.com.
>

Re: GroupBy with index is really slow.

Posted by afedotov <al...@gmail.com>.

Hi,

Which Ignite version do you use?
How many Ignite nodes do you run?
What is your cache configuration?

As well, could you please try the following query and see the performance?
SELECT DISTINCT age FROM PersonWithindex;


Kind regards,
Alex.

On Wed, May 10, 2017 at 5:57 PM, Guillermo Ortiz [via Apache Ignite Users] <
ml+s70518n12592h86@n6.nabble.com> wrote:

> It's my laptop. 16gb, i7.. The collection size it's 5Mill objects (1,2GB)
> and it takes about 30sec.
> I have tried to execute the query with the same collection with and
> without indices with the same time results.
>
> This is a test before to run in the real cluster with 6 nodes of 512Gb and
> 48cores each one. We tried to execute there with a real dataset(22M rows,
> 7gb) but it takes so long that query is aborted.
>
>
> 2017-05-08 12:12 GMT+02:00 Andrey Mashenkov <[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=12592&i=0>>:
>
>> Hi,
>>
>> Why do you think query is slow?
>> What its execution time? What is expected time?
>> How many nodes do you use? How much data does cache contains?
>>
>> On Mon, May 8, 2017 at 1:47 AM, Guillermo Ortiz <[hidden email]
>> <http:///user/SendEmail.jtp?type=node&node=12592&i=1>> wrote:
>>
>>> I have a simple cache where I have Person object (id, age and so on..)
>>>
>>> I have created indeces by id and age to try some examples but queries
>>> with "group by" go really slow.
>>>
>>> I'm trying this query:
>>> SELECT age
>>> FROM
>>> PersonWithindex
>>> group by age
>>>
>>>
>>> SELECT
>>>     AGE AS __C0
>>> FROM "personCacheWithIndex".PERSONWITHINDEX
>>>     /* "personCacheWithIndex"."age_idx" */
>>> GROUP BY AGE
>>> /* group sorted */
>>>
>>> SELECT
>>>     __C0 AS AGE
>>> FROM PUBLIC.__T0
>>>     /* "personCacheWithIndex"."merge_scan" */
>>> GROUP BY __C0
>>>
>>> Although it seems that it uses index, why is it going so slow? I think
>>> that it should be pretty fast with an index.
>>>
>>
>>
>>
>> --
>> Best regards,
>> Andrey V. Mashenkov
>>
>
>
>
> ------------------------------
> If you reply to this email, your message will be added to the discussion
> below:
> http://apache-ignite-users.70518.x6.nabble.com/GroupBy-
> with-index-is-really-slow-tp12526p12592.html
> To start a new topic under Apache Ignite Users, email
> ml+s70518n1h65@n6.nabble.com
> To unsubscribe from Apache Ignite Users, click here
> <http://apache-ignite-users.70518.x6.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=YWxleGFuZGVyLmZlZG90b2ZmQGdtYWlsLmNvbXwxfC0xMzYxNTU0NTg=>
> .
> NAML
> <http://apache-ignite-users.70518.x6.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>




--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/GroupBy-with-index-is-really-slow-tp12526p12593.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: GroupBy with index is really slow.

Posted by Guillermo Ortiz <ko...@gmail.com>.

Okay, I'll try with that.

2017-05-11 13:49 GMT+02:00 Andrey Mashenkov <an...@gmail.com>:

> Hi,
>
> Seems, full index scan is needed for this query.
> Adding more nodes and\or using queryParallelism feature [1] should reduce
> query time by utilizing multiple threads, when each thread will process
> with smaller index.
>
> [1] https://apacheignite.readme.io/docs/sql-performance-and-debugging#
> query-parallelism
>
> On Wed, May 10, 2017 at 9:55 PM, Guillermo Ortiz <ko...@gmail.com>
> wrote:
>
>> Yes,
>>
>> ******Result SELECT distinct(age) FROM PERSONWITHINDEX
>> 97,98,99,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,2
>> 0,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,4
>> 0,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,6
>> 0,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,8
>> 0,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,
>> Elapsed time SELECT distinct(age) FROM PERSONWITHINDEX:2880ms
>> [SELECT DISTINCT
>>     __Z0.AGE AS __C0_0
>> FROM "PersonWithIndex".PERSONWITHINDEX __Z0
>>     /* "PersonWithIndex".PERSONWITHINDEX_AGE_IDX */]
>> [SELECT DISTINCT
>>     __C0_0 AS AGE
>> FROM PUBLIC.__T0
>>     /* "PersonWithIndex"."merge_scan" */]
>>
>>
>>
>> ******Result SELECT distinct(age) FROM PERSONWITHOUTINDEX
>> 97,98,99,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,2
>> 0,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,4
>> 0,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,6
>> 0,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,8
>> 0,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,
>> Elapsed time SELECT distinct(age) FROM PERSONWITHOUTINDEX:2976ms
>> [SELECT DISTINCT
>>     __Z0.AGE AS __C0_0
>> FROM "PersonWithoutIndex".PERSONWITHOUTINDEX __Z0
>>     /* "PersonWithoutIndex".PERSONWITHOUTINDEX.__SCAN_ */]
>> [SELECT DISTINCT
>>     __C0_0 AS AGE
>> FROM PUBLIC.__T0
>>     /* "PersonWithoutIndex"."merge_scan" */]
>>
>> Age is a value between 0-100. I tried with a value between 0-1000 with
>> the same result.
>> It seems that it's not using any index.
>>
>> I'm not sure if in memory a random access is slower than a sequencial
>> access. I guess, but I'm not sure.
>>
>>
>> 2017-05-10 19:50 GMT+02:00 Andrey Mashenkov <an...@gmail.com>:
>>
>>> Hi,
>>>
>>> I've just think you can achive same result with query without groupBy.
>>> Select distinct age from ..
>>>
>>> Does it workable for you?
>>>
>>> 10 мая 2017 г. 17:56 пользователь "Guillermo Ortiz" <
>>> konstt2000@gmail.com> написал:
>>>
>>> It's my laptop. 16gb, i7.. The collection size it's 5Mill objects
>>>> (1,2GB) and it takes about 30sec.
>>>> I have tried to execute the query with the same collection with and
>>>> without indices with the same time results.
>>>>
>>>> This is a test before to run in the real cluster with 6 nodes of 512Gb
>>>> and 48cores each one. We tried to execute there with a real dataset(22M
>>>> rows, 7gb) but it takes so long that query is aborted.
>>>>
>>>>
>>>> 2017-05-08 12:12 GMT+02:00 Andrey Mashenkov <andrey.mashenkov@gmail.com
>>>> >:
>>>>
>>>>> Hi,
>>>>>
>>>>> Why do you think query is slow?
>>>>> What its execution time? What is expected time?
>>>>> How many nodes do you use? How much data does cache contains?
>>>>>
>>>>> On Mon, May 8, 2017 at 1:47 AM, Guillermo Ortiz <ko...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I have a simple cache where I have Person object (id, age and so on..)
>>>>>>
>>>>>> I have created indeces by id and age to try some examples but queries
>>>>>> with "group by" go really slow.
>>>>>>
>>>>>> I'm trying this query:
>>>>>> SELECT age
>>>>>> FROM
>>>>>> PersonWithindex
>>>>>> group by age
>>>>>>
>>>>>>
>>>>>> SELECT
>>>>>>     AGE AS __C0
>>>>>> FROM "personCacheWithIndex".PERSONWITHINDEX
>>>>>>     /* "personCacheWithIndex"."age_idx" */
>>>>>> GROUP BY AGE
>>>>>> /* group sorted */
>>>>>>
>>>>>> SELECT
>>>>>>     __C0 AS AGE
>>>>>> FROM PUBLIC.__T0
>>>>>>     /* "personCacheWithIndex"."merge_scan" */
>>>>>> GROUP BY __C0
>>>>>>
>>>>>> Although it seems that it uses index, why is it going so slow? I
>>>>>> think that it should be pretty fast with an index.
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best regards,
>>>>> Andrey V. Mashenkov
>>>>>
>>>>
>>>>
>>
>
>
> --
> Best regards,
> Andrey V. Mashenkov
>

Re: GroupBy with index is really slow.

Posted by Andrey Mashenkov <an...@gmail.com>.

Hi,

Seems, full index scan is needed for this query.
Adding more nodes and\or using queryParallelism feature [1] should reduce
query time by utilizing multiple threads, when each thread will process
with smaller index.

[1]
https://apacheignite.readme.io/docs/sql-performance-and-debugging#query-parallelism

On Wed, May 10, 2017 at 9:55 PM, Guillermo Ortiz <ko...@gmail.com>
wrote:

> Yes,
>
> ******Result SELECT distinct(age) FROM PERSONWITHINDEX
> 97,98,99,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,2
> 0,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,4
> 0,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,6
> 0,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,8
> 0,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,
> Elapsed time SELECT distinct(age) FROM PERSONWITHINDEX:2880ms
> [SELECT DISTINCT
>     __Z0.AGE AS __C0_0
> FROM "PersonWithIndex".PERSONWITHINDEX __Z0
>     /* "PersonWithIndex".PERSONWITHINDEX_AGE_IDX */]
> [SELECT DISTINCT
>     __C0_0 AS AGE
> FROM PUBLIC.__T0
>     /* "PersonWithIndex"."merge_scan" */]
>
>
>
> ******Result SELECT distinct(age) FROM PERSONWITHOUTINDEX
> 97,98,99,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,2
> 0,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,4
> 0,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,6
> 0,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,8
> 0,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,
> Elapsed time SELECT distinct(age) FROM PERSONWITHOUTINDEX:2976ms
> [SELECT DISTINCT
>     __Z0.AGE AS __C0_0
> FROM "PersonWithoutIndex".PERSONWITHOUTINDEX __Z0
>     /* "PersonWithoutIndex".PERSONWITHOUTINDEX.__SCAN_ */]
> [SELECT DISTINCT
>     __C0_0 AS AGE
> FROM PUBLIC.__T0
>     /* "PersonWithoutIndex"."merge_scan" */]
>
> Age is a value between 0-100. I tried with a value between 0-1000 with the
> same result.
> It seems that it's not using any index.
>
> I'm not sure if in memory a random access is slower than a sequencial
> access. I guess, but I'm not sure.
>
>
> 2017-05-10 19:50 GMT+02:00 Andrey Mashenkov <an...@gmail.com>:
>
>> Hi,
>>
>> I've just think you can achive same result with query without groupBy.
>> Select distinct age from ..
>>
>> Does it workable for you?
>>
>> 10 мая 2017 г. 17:56 пользователь "Guillermo Ortiz" <ko...@gmail.com>
>> написал:
>>
>> It's my laptop. 16gb, i7.. The collection size it's 5Mill objects (1,2GB)
>>> and it takes about 30sec.
>>> I have tried to execute the query with the same collection with and
>>> without indices with the same time results.
>>>
>>> This is a test before to run in the real cluster with 6 nodes of 512Gb
>>> and 48cores each one. We tried to execute there with a real dataset(22M
>>> rows, 7gb) but it takes so long that query is aborted.
>>>
>>>
>>> 2017-05-08 12:12 GMT+02:00 Andrey Mashenkov <an...@gmail.com>
>>> :
>>>
>>>> Hi,
>>>>
>>>> Why do you think query is slow?
>>>> What its execution time? What is expected time?
>>>> How many nodes do you use? How much data does cache contains?
>>>>
>>>> On Mon, May 8, 2017 at 1:47 AM, Guillermo Ortiz <ko...@gmail.com>
>>>> wrote:
>>>>
>>>>> I have a simple cache where I have Person object (id, age and so on..)
>>>>>
>>>>> I have created indeces by id and age to try some examples but queries
>>>>> with "group by" go really slow.
>>>>>
>>>>> I'm trying this query:
>>>>> SELECT age
>>>>> FROM
>>>>> PersonWithindex
>>>>> group by age
>>>>>
>>>>>
>>>>> SELECT
>>>>>     AGE AS __C0
>>>>> FROM "personCacheWithIndex".PERSONWITHINDEX
>>>>>     /* "personCacheWithIndex"."age_idx" */
>>>>> GROUP BY AGE
>>>>> /* group sorted */
>>>>>
>>>>> SELECT
>>>>>     __C0 AS AGE
>>>>> FROM PUBLIC.__T0
>>>>>     /* "personCacheWithIndex"."merge_scan" */
>>>>> GROUP BY __C0
>>>>>
>>>>> Although it seems that it uses index, why is it going so slow? I think
>>>>> that it should be pretty fast with an index.
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Best regards,
>>>> Andrey V. Mashenkov
>>>>
>>>
>>>
>


-- 
Best regards,
Andrey V. Mashenkov

Re: GroupBy with index is really slow.

Posted by Guillermo Ortiz <ko...@gmail.com>.

Yes,

******Result SELECT distinct(age) FROM PERSONWITHINDEX
97,98,99,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,
20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,
40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,
60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,
80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,
Elapsed time SELECT distinct(age) FROM PERSONWITHINDEX:2880ms
[SELECT DISTINCT
    __Z0.AGE AS __C0_0
FROM "PersonWithIndex".PERSONWITHINDEX __Z0
    /* "PersonWithIndex".PERSONWITHINDEX_AGE_IDX */]
[SELECT DISTINCT
    __C0_0 AS AGE
FROM PUBLIC.__T0
    /* "PersonWithIndex"."merge_scan" */]



******Result SELECT distinct(age) FROM PERSONWITHOUTINDEX
97,98,99,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,
20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,
40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,
60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,
80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,
Elapsed time SELECT distinct(age) FROM PERSONWITHOUTINDEX:2976ms
[SELECT DISTINCT
    __Z0.AGE AS __C0_0
FROM "PersonWithoutIndex".PERSONWITHOUTINDEX __Z0
    /* "PersonWithoutIndex".PERSONWITHOUTINDEX.__SCAN_ */]
[SELECT DISTINCT
    __C0_0 AS AGE
FROM PUBLIC.__T0
    /* "PersonWithoutIndex"."merge_scan" */]

Age is a value between 0-100. I tried with a value between 0-1000 with the
same result.
It seems that it's not using any index.

I'm not sure if in memory a random access is slower than a sequencial
access. I guess, but I'm not sure.


2017-05-10 19:50 GMT+02:00 Andrey Mashenkov <an...@gmail.com>:

> Hi,
>
> I've just think you can achive same result with query without groupBy.
> Select distinct age from ..
>
> Does it workable for you?
>
> 10 мая 2017 г. 17:56 пользователь "Guillermo Ortiz" <ko...@gmail.com>
> написал:
>
> It's my laptop. 16gb, i7.. The collection size it's 5Mill objects (1,2GB)
>> and it takes about 30sec.
>> I have tried to execute the query with the same collection with and
>> without indices with the same time results.
>>
>> This is a test before to run in the real cluster with 6 nodes of 512Gb
>> and 48cores each one. We tried to execute there with a real dataset(22M
>> rows, 7gb) but it takes so long that query is aborted.
>>
>>
>> 2017-05-08 12:12 GMT+02:00 Andrey Mashenkov <an...@gmail.com>:
>>
>>> Hi,
>>>
>>> Why do you think query is slow?
>>> What its execution time? What is expected time?
>>> How many nodes do you use? How much data does cache contains?
>>>
>>> On Mon, May 8, 2017 at 1:47 AM, Guillermo Ortiz <ko...@gmail.com>
>>> wrote:
>>>
>>>> I have a simple cache where I have Person object (id, age and so on..)
>>>>
>>>> I have created indeces by id and age to try some examples but queries
>>>> with "group by" go really slow.
>>>>
>>>> I'm trying this query:
>>>> SELECT age
>>>> FROM
>>>> PersonWithindex
>>>> group by age
>>>>
>>>>
>>>> SELECT
>>>>     AGE AS __C0
>>>> FROM "personCacheWithIndex".PERSONWITHINDEX
>>>>     /* "personCacheWithIndex"."age_idx" */
>>>> GROUP BY AGE
>>>> /* group sorted */
>>>>
>>>> SELECT
>>>>     __C0 AS AGE
>>>> FROM PUBLIC.__T0
>>>>     /* "personCacheWithIndex"."merge_scan" */
>>>> GROUP BY __C0
>>>>
>>>> Although it seems that it uses index, why is it going so slow? I think
>>>> that it should be pretty fast with an index.
>>>>
>>>
>>>
>>>
>>> --
>>> Best regards,
>>> Andrey V. Mashenkov
>>>
>>
>>

Re: GroupBy with index is really slow.

Posted by Andrey Mashenkov <an...@gmail.com>.

Hi,

I've just think you can achive same result with query without groupBy.
Select distinct age from ..

Does it workable for you?

10 мая 2017 г. 17:56 пользователь "Guillermo Ortiz" <ko...@gmail.com>
написал:

> It's my laptop. 16gb, i7.. The collection size it's 5Mill objects (1,2GB)
> and it takes about 30sec.
> I have tried to execute the query with the same collection with and
> without indices with the same time results.
>
> This is a test before to run in the real cluster with 6 nodes of 512Gb and
> 48cores each one. We tried to execute there with a real dataset(22M rows,
> 7gb) but it takes so long that query is aborted.
>
>
> 2017-05-08 12:12 GMT+02:00 Andrey Mashenkov <an...@gmail.com>:
>
>> Hi,
>>
>> Why do you think query is slow?
>> What its execution time? What is expected time?
>> How many nodes do you use? How much data does cache contains?
>>
>> On Mon, May 8, 2017 at 1:47 AM, Guillermo Ortiz <ko...@gmail.com>
>> wrote:
>>
>>> I have a simple cache where I have Person object (id, age and so on..)
>>>
>>> I have created indeces by id and age to try some examples but queries
>>> with "group by" go really slow.
>>>
>>> I'm trying this query:
>>> SELECT age
>>> FROM
>>> PersonWithindex
>>> group by age
>>>
>>>
>>> SELECT
>>>     AGE AS __C0
>>> FROM "personCacheWithIndex".PERSONWITHINDEX
>>>     /* "personCacheWithIndex"."age_idx" */
>>> GROUP BY AGE
>>> /* group sorted */
>>>
>>> SELECT
>>>     __C0 AS AGE
>>> FROM PUBLIC.__T0
>>>     /* "personCacheWithIndex"."merge_scan" */
>>> GROUP BY __C0
>>>
>>> Although it seems that it uses index, why is it going so slow? I think
>>> that it should be pretty fast with an index.
>>>
>>
>>
>>
>> --
>> Best regards,
>> Andrey V. Mashenkov
>>
>
>

Re: GroupBy with index is really slow.

Posted by Guillermo Ortiz <ko...@gmail.com>.

It's my laptop. 16gb, i7.. The collection size it's 5Mill objects (1,2GB)
and it takes about 30sec.
I have tried to execute the query with the same collection with and without
indices with the same time results.

This is a test before to run in the real cluster with 6 nodes of 512Gb and
48cores each one. We tried to execute there with a real dataset(22M rows,
7gb) but it takes so long that query is aborted.


2017-05-08 12:12 GMT+02:00 Andrey Mashenkov <an...@gmail.com>:

> Hi,
>
> Why do you think query is slow?
> What its execution time? What is expected time?
> How many nodes do you use? How much data does cache contains?
>
> On Mon, May 8, 2017 at 1:47 AM, Guillermo Ortiz <ko...@gmail.com>
> wrote:
>
>> I have a simple cache where I have Person object (id, age and so on..)
>>
>> I have created indeces by id and age to try some examples but queries
>> with "group by" go really slow.
>>
>> I'm trying this query:
>> SELECT age
>> FROM
>> PersonWithindex
>> group by age
>>
>>
>> SELECT
>>     AGE AS __C0
>> FROM "personCacheWithIndex".PERSONWITHINDEX
>>     /* "personCacheWithIndex"."age_idx" */
>> GROUP BY AGE
>> /* group sorted */
>>
>> SELECT
>>     __C0 AS AGE
>> FROM PUBLIC.__T0
>>     /* "personCacheWithIndex"."merge_scan" */
>> GROUP BY __C0
>>
>> Although it seems that it uses index, why is it going so slow? I think
>> that it should be pretty fast with an index.
>>
>
>
>
> --
> Best regards,
> Andrey V. Mashenkov
>

Re: GroupBy with index is really slow.

Posted by Andrey Mashenkov <an...@gmail.com>.

Hi,

Why do you think query is slow?
What its execution time? What is expected time?
How many nodes do you use? How much data does cache contains?

On Mon, May 8, 2017 at 1:47 AM, Guillermo Ortiz <ko...@gmail.com>
wrote:

> I have a simple cache where I have Person object (id, age and so on..)
>
> I have created indeces by id and age to try some examples but queries with
> "group by" go really slow.
>
> I'm trying this query:
> SELECT age
> FROM
> PersonWithindex
> group by age
>
>
> SELECT
>     AGE AS __C0
> FROM "personCacheWithIndex".PERSONWITHINDEX
>     /* "personCacheWithIndex"."age_idx" */
> GROUP BY AGE
> /* group sorted */
>
> SELECT
>     __C0 AS AGE
> FROM PUBLIC.__T0
>     /* "personCacheWithIndex"."merge_scan" */
> GROUP BY __C0
>
> Although it seems that it uses index, why is it going so slow? I think
> that it should be pretty fast with an index.
>



-- 
Best regards,
Andrey V. Mashenkov