You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@iotdb.apache.org by "Jialin Qiao (Jira)" <ji...@apache.org> on 2022/06/20 15:39:00 UTC
[jira] [Reopened] (IOTDB-544) Apache IoTDB integration with more powerful aggregation index

     [ https://issues.apache.org/jira/browse/IOTDB-544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jialin Qiao reopened IOTDB-544:
-------------------------------

> Apache IoTDB integration with more powerful aggregation index
> -------------------------------------------------------------
>
>                 Key: IOTDB-544
>                 URL: https://issues.apache.org/jira/browse/IOTDB-544
>             Project: Apache IoTDB
>          Issue Type: Wish
>          Components: Core/Engine
>            Reporter: Xiangdong Huang
>            Assignee: Zesong Sun
>            Priority: Major
>              Labels: IoTDB, gsoc2020, mentor, pull-request-available
>
> IoTDB is a highly efficient time series database, which supports high speed query process, including aggregation query.
> Currently, IoTDB pre-calculates the aggregation info, or called the summary info, (sum, count, max_time, min_time, max_value, min_value) for each page and each Chunk. The info is helpful for aggregation operations and some query filters. For example, if the query filter is value >10 and the max value of a page is 9, we can skip the page. For another example, if the query is select max(value) and the max value of 3 chunks are 5, 10, 20, then the max(value) is 20. 
> However, there are two drawbacks:
> 1. The summary info actually reduces the data that needs to be scanned as 1/k (suppose each page has k data points). However, the time complexity is still O(N). If we store a long historical data, e.g., storing 2 years data with 500KHz, then the aggregation operation may be still time-consuming. So, a tree-based index to reduce the time complexity from O(N) to O(logN) is a good choice. Some basic ideas have been published in [1], while it can just handle data with fix frequency. So, improving it and implementing it into IoTDB is a good choice.
> 2. The summary info is helpless for evaluating the query like where value >8 if the max value = 10. If we can enrich the summary info, e.g., storing the data histogram, we can use the histogram to evaluate how many points we can return. 
> This proposal is mainly for adding an index for speeding up the aggregation query. Besides, if we can let the summary info be more useful, it could be better.
> Notice that the premise is that the insertion speed should not be slow down too much!
> You should know:
>  • IoTDB query process
>  • TsFile structure and organization
>  • Basic index knowledge
>  • Java 
> difficulty: Major
>  mentors:
>  hxd@apache.org
> Reference:
> [1] [https://www.sciencedirect.com/science/article/pii/S0306437918305489]
>   
>   
>   



--
This message was sent by Atlassian Jira
(v8.20.7#820007)