You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@iotdb.apache.org by "Jialin Qiao (Jira)" <ji...@apache.org> on 2022/06/20 15:39:00 UTC
[jira] [Reopened] (IOTDB-544) Apache IoTDB integration with more powerful aggregation index
[ https://issues.apache.org/jira/browse/IOTDB-544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jialin Qiao reopened IOTDB-544:
-------------------------------
> Apache IoTDB integration with more powerful aggregation index
> -------------------------------------------------------------
>
> Key: IOTDB-544
> URL: https://issues.apache.org/jira/browse/IOTDB-544
> Project: Apache IoTDB
> Issue Type: Wish
> Components: Core/Engine
> Reporter: Xiangdong Huang
> Assignee: Zesong Sun
> Priority: Major
> Labels: IoTDB, gsoc2020, mentor, pull-request-available
>
> IoTDB is a highly efficient time series database, which supports high speed query process, including aggregation query.
> Currently, IoTDB pre-calculates the aggregation info, or called the summary info, (sum, count, max_time, min_time, max_value, min_value) for each page and each Chunk. The info is helpful for aggregation operations and some query filters. For example, if the query filter is value >10 and the max value of a page is 9, we can skip the page. For another example, if the query is select max(value) and the max value of 3 chunks are 5, 10, 20, then the max(value) is 20.
> However, there are two drawbacks:
> 1. The summary info actually reduces the data that needs to be scanned as 1/k (suppose each page has k data points). However, the time complexity is still O(N). If we store a long historical data, e.g., storing 2 years data with 500KHz, then the aggregation operation may be still time-consuming. So, a tree-based index to reduce the time complexity from O(N) to O(logN) is a good choice. Some basic ideas have been published in [1], while it can just handle data with fix frequency. So, improving it and implementing it into IoTDB is a good choice.
> 2. The summary info is helpless for evaluating the query like where value >8 if the max value = 10. If we can enrich the summary info, e.g., storing the data histogram, we can use the histogram to evaluate how many points we can return.
> This proposal is mainly for adding an index for speeding up the aggregation query. Besides, if we can let the summary info be more useful, it could be better.
> Notice that the premise is that the insertion speed should not be slow down too much!
> You should know:
> • IoTDB query process
> • TsFile structure and organization
> • Basic index knowledge
> • Java
> difficulty: Major
> mentors:
> hxd@apache.org
> Reference:
> [1] [https://www.sciencedirect.com/science/article/pii/S0306437918305489]
>
>
>
--
This message was sent by Atlassian Jira
(v8.20.7#820007)