You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@iotdb.apache.org by "Jialin Qiao (Jira)" <ji...@apache.org> on 2020/02/26 02:49:00 UTC
[jira] [Closed] (IOTDB-306) count query is not that fast

     [ https://issues.apache.org/jira/browse/IOTDB-306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jialin Qiao closed IOTDB-306.
-----------------------------
    Fix Version/s: 0.10.0-SNAPSHOT
       Resolution: Fixed

> count query is not that fast
> ----------------------------
>
>                 Key: IOTDB-306
>                 URL: https://issues.apache.org/jira/browse/IOTDB-306
>             Project: Apache IoTDB
>          Issue Type: Improvement
>            Reporter: Lei Rui
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.10.0-SNAPSHOT
>
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> According to my test, 
> *q1: select count(s_10) from root.group_0.d_17 where time>=2018-09-20T00:00:00+08:00 and time<=2018-09-20T23:59:59+08:00*
> ||Total time cost||readTsFileMetaData||readTsDeviceMetaData||readMemChunk||
> |23,998|1,367|13,591|7,592|
>  Unit: ms
> *q2: select s_10 from root.group_0.d_17 where time>=2018-09-20T00:00:00+08:00 and time<=2018-09-20T23:59:59+08:00*
> ||Total time cost||readTsFileMetaData||readTsDeviceMetaData||readMemChunk||
> |27,783|31.2+2,068|134+13,880|14.9+9,587|
>  Unit: ms
> (The "+" is because the step happens in both `createNewDataSet` and `convertQueryDataSetByFetchSize` phases.)
> As is shown,  the total time cost of q1 is just a little bit smaller than q2. The costs of the three major steps - `readTsFileMetaData`, `readTsDeviceMetaData`, and `readMemChunk` - are very close. 
> The reason for this consequence is that the query execution process of count query reads chunk data from disk into memory anyway and in the best cases utilizes statistics (i,e., numOfPoints) in the pageHeader instead of reading page data. However, the time cost of reading page data (see `ChunkReader.nextBatch`) is not that large, as it is performed in memory. Therefore, the execution process of count query overlaps mostly with that of without count query.
> And probably other aggregate queries have the similar results.
> A direction of performance improvement of count query (and probably other aggregate queries) is to avoid `readMemChunk` whenever the statistics in the ChunkMetaData can be utilized.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)