You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@asterixdb.apache.org by "Wenhai (JIRA)" <ji...@apache.org> on 2016/05/12 02:18:12 UTC

[jira] [Comment Edited] (ASTERIXDB-1433) Multiple cores with huge memory slow down in the big fact table aggregation.

    [ https://issues.apache.org/jira/browse/ASTERIXDB-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15281114#comment-15281114 ] 

Wenhai edited comment on ASTERIXDB-1433 at 5/12/16 2:17 AM:
------------------------------------------------------------

The IO statistics is from the iostat command which is on average at the speed of 160MB/s (with hot running) or 60MB/s (on code running). i.e., after we aggregating a 60GB table, the reloading time of another aggregation will consume at least 600s. Of course, we can question whether we configured so slow disk system, but we have a huge memory space which is not so much expensive.

Best,
Wenhai


was (Author: lwhay):
The IO statistics is from the iostat command which is on average at the speed of 160MB/s (with hot running) or 60MB/s (on code running). i.e., after we aggregating a 60GB table, the reloading time of another aggregation will consume at least 600s. Of course, we can question whether we configured so slow disk system, but we have a huge memory space which is not so much expensive.

> Multiple cores with huge memory slow down in the big fact table aggregation.
> ----------------------------------------------------------------------------
>
>                 Key: ASTERIXDB-1433
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-1433
>             Project: Apache AsterixDB
>          Issue Type: Improvement
>          Components: Hyracks Core
>         Environment: 10 nodes X Linux ubuntu/6 cpu X 4 cores/per cpu, 128 GB memory/per node.
>            Reporter: Wenhai
>
> This is a classic hardware platform that shoes up the TB scale of dataset in total. AsterixDB does extremely well for the complex query that includes multiple join operators over a high-selectivity select operator. However, the running trace results demonstrate that, as compared to the big memory configurations, the original tables is always re-loaded from the disk to the actual memory even they have been handled in the latest query. To this end, why not provide the strategy to keep the intermediate data of the last completed query into the memory and free them in case the memory is not  enough for the newly query. In some case, the user will always trigger the query with the different parameters on the same tables, for example, the variant-parameter aggregation on the single big fact table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)