You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@iotdb.apache.org by "Rong Kang (Jira)" <ji...@apache.org> on 2020/08/09 14:24:00 UTC
[jira] [Commented] (IOTDB-804) Index Framework

    [ https://issues.apache.org/jira/browse/IOTDB-804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17173878#comment-17173878 ] 

Rong Kang commented on IOTDB-804:
---------------------------------

One of the targets of IoTDB Index Framework is to provide a list of common and classical operators (or called "tools") in the time series domain. These operators can be roughly put into three categories. Some of them has been supported, and others will be added in the future:

- Time-Series Representations, like PAA(Piecewise Aggregate Approximation, supported), DFT(Discrete Fourier Transform, not supported yet), etc..

- Measurements, like ED(Euclidean Distance, supported), DTW(Dynamic Time Warping, supported).

- Index Techniques, like RTree(supported), NoIndex(sequential-scan-speedup), iSAX, and Hash lookup Table.

We will give a brief introduction to iSAX and Hash lookup Table, and try to implement them in the next version:

- ISAXIndex [1] first splits the input data into segments and extracts their discrete SAX features, and then inserts the data features into a root leaf node.  When the number of series of a leaf node reaches the given threshold, ISAXIndex will upscale the resolution of SAX features, thus dividing them into two leaf nodes. The SAX lower bound constraint is used to guarantee no-false-dismissals pruning.
In some indexing techniques, ISAXIndex [2] is composed of a list of subtrees. Suppose the input series is divided into b segments, all data will be first divided into 2^b sub-index tree according to the first-bits of SAX features.

- The Hash lookup table is a data structure a structure that can map keys to values. A hash table uses a hash function to compute an index, also called hash code, into an array of buckets, from which the desired value can be found. Generally, the cost of retrieving a bucket can be regarded as O(1) Hamming space retrieval [3] returns data points within a Hamming ball of radius H for each query. Therefore, it enables the constant-time Approximate Nearest Neighbor (ANN) search through hash lookups. Based on the mature hash lookup algorithm and the Hamming space retrieval, a rich line of hash methods focuses on better feature representations for preserving the similarity relationship in the Hamming space.

[1] Shieh, Jin, and Eamonn Keogh, "ISAX: Disk-Aware Mining and Indexing of Massive Time Series Datasets". Data Mining and Knowledge Discovery 19(1): 24–57. 2009
[2] Camerra, Alessandro, Themis Palpanas, Jin Shieh, and Eamonn Keogh. "ISAX 2.0: Indexing and Mining One Billion Time Series." In 2010 IEEE International Conference on Data Mining Pp. 58–67. 2010
[3] Cao, Yue, Mingsheng Long, Bin Liu, and Jianmin Wang. "Deep Cauchy Hashing for Hamming Space Retrieval." In Pp. 1229–1237.  2018.



> Index Framework
> ---------------
>
>                 Key: IOTDB-804
>                 URL: https://issues.apache.org/jira/browse/IOTDB-804
>             Project: Apache IoTDB
>          Issue Type: New Feature
>          Components: Core/Engine
>            Reporter: Rong Kang
>            Priority: Major
>
> Implement the index framework in IoTDB, supporting the time-series index: creation, building and query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)