You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@iotdb.apache.org by "Xiangdong Huang (Jira)" <ji...@apache.org> on 2021/01/27 08:46:00 UTC

[jira] [Created] (IOTDB-1131) dictionary encoding of deviceID and measurementID in WAL

Xiangdong Huang created IOTDB-1131:
--------------------------------------

             Summary: dictionary encoding of deviceID and measurementID in WAL
                 Key: IOTDB-1131
                 URL: https://issues.apache.org/jira/browse/IOTDB-1131
             Project: Apache IoTDB
          Issue Type: Improvement
          Components: WAL
            Reporter: Xiangdong Huang


This is an interesting idea that proposed by Tian Jiang.

Copy from Tian Jiang:

Write ahead logs (WALs) ensure that data which are not persisted yet can still be recovered from a system failure, thus to increase the durability of a DBMS. However, WALs generally require more frequent flushes to limit the possibility of losing data, which increases disk utilities significantly as each flush requires one disk I/O. Moreover, logs are hardly compressed or encoded like what we are doing to the raw data in TsFiles, and result is that logs containing the same data consume much larger space than the data chunks. The disadvantages are two-folds: first, large logs will compete for more disk bandwidth, slowing down the persistence of raw data; second, even if WALs are placed on another disk, (possibly SSD for high throughput), as WALs are removed frequently once their corresponding data are persisted, such frequent write-and-erases will shorten disk life especially for SSDs.

So it is beneficial to reduce the sizes of WALs. In IoTDB (and also other DBMSs), the majority of WALs are logs of insertions, as other operations like deletions and updates are often rare compared with insertions. This observation enlightens us that may focus on reducing sizes of insertion logs, which is enough to attain ideal improvement of the whole system. Currently, we serialize complete physical plans into WAL, but we notice that despite values and timestamps generally varies from plan to plan, head information like deviceIds, measurementIds and data types are highly redundant, and sometimes deviceIds and measurementIds are long strings, which may consume a significant amount of space. So in this design, we concentrate on reducing duplicated deviceIds, measurementIds and data types in WALs.

Method
To reduce duplicated deviceIds, measurementIds and data types in WALs, we use windowed differentiation technique (or referencing) to replace redundant fields with a index pointing to a base log, if such a log can be found within a given window. Detailed procedure are described below:





--
This message was sent by Atlassian Jira
(v8.3.4#803005)