You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@iotdb.apache.org by GitBox <gi...@apache.org> on 2021/05/11 07:28:11 UTC

[GitHub] [iotdb] zhanglingzhe0820 opened a new pull request #3159: [IOTDB-1357] Compaction use append chunk merge strategy when chunk is already large

zhanglingzhe0820 opened a new pull request #3159:
URL: https://github.com/apache/iotdb/pull/3159

## Problem

For level compaction (take the level compaction in sequential space as an example), the size of the merged chunk is controlled by the two parameters: seq_level_num and seq_file_num_in_each_level, that is, the original chunk is expanded by seq_file_num_in_each_level^(seq_level_num-1) times. This configuration scheme has three problems as follows:

1. When configuring, you need to know the size of a chunk originally written by the user, and configure it based on experience. This configuration method is difficult for users to configure by themselves. For each user, we also need to observe files and assist in configuration, which is very inconvenient.

2. Once the user uses the default parameters without modification or configuration errors, and the user's own scene does not need to compact too many files, it will occupy a lot of invalid disk IO, and even reduce query efficiency.

3. In some scenarios, the writing speeds of different time series are different, so according to this compaction method, a lot of chunks of different sizes will be merged in the end, and the chunks corresponding to the time series with slow writing speed will be smaller. The chunks corresponding to the time series with fast entry speed are large, resulting in uneven chunk sizes and difficult to control.

example:

Assuming that the target chunk that the user needs is 4 times the size of the original chunk, there are the following files

Level 0: file1(s1,s2,s2)+file2(s2,s2,s2) That is, file1 has 1 chunk of s1, 2 chunks of s2, and file2 has 3 chunks of s2

compact to level 1: file3(s1(1), s2(5)) The merged file has 1 chunk of s1, and 1 chunk of s2 that is 5 times the size. Suppose it and another file4(s1(1), s2(5)) to merge

compact to layer 2: file5(s1(2),s2(9)) The merged file has a chunk of s1 that is twice the size, and a chunk of s2 that is 9 times the size

It can be seen that the chunks of s2 are merged and larger, which far exceeds the chunk size required by the user, but the chunk corresponding to s1 still does not meet the requirements of the user.

## Solution

Use the configuration merge_chunk_point_number_threshold to control when merging each chunk,

If all the chunks corresponding to this sensor in the list to be merged have reached this threshold, the chunks are no longer merged, and the chunks read out are written directly to the new file.

More info about this function in [confluence](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=177047564)

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [iotdb] LebronAl merged pull request #3159: [IOTDB-1357] Compaction use append chunk merge strategy when chunk is already large

Posted by GitBox <gi...@apache.org>.

LebronAl merged pull request #3159:
URL: https://github.com/apache/iotdb/pull/3159


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org