You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@iotdb.apache.org by 445073309 <su...@foxmail.com> on 2020/07/20 04:42:31 UTC

add vm(hot compaction) in tsfile processor

Hi,


I met a problem that iotdb will write small chunk data when lack of memtable num, this causes the system to query hot data more slowly.


So I create a new type of file -- vm file, and use it to do hot compaction in flush processor. With this, we can flexiblily controll the size of each chunk. The configuration and usage&nbsp;changes will be described&nbsp;as follows:
* add a new&nbsp;parameter enable_vm in&nbsp;iotdb-engine.properties: indicates whether to use virtual memory
* use&nbsp;parameter avg_series_point_number_threshold&nbsp;in&nbsp;iotdb-engine.properties: indicates&nbsp;the minimum average number of chunk data points after hot compaction
* add a new&nbsp;parameter max_vm_num in&nbsp;iotdb-engine.properties: indicates&nbsp;that a TsFileProcessor has at most the number of virtual memory files
* add a new&nbsp;parameter max_merge_chunk_num_in_tsfile in&nbsp;iotdb-engine.properties: indicates the vm files max merge times
* the suffix of the vm file is'.vm', and the naming convention is {tsfile_name}-{level}-{timestamp}.vm


And there are many detail changes like:
* set virtual memory file list List<List<TsFileResource&gt;&gt; vmTsFileResources for each TsFileProcessor, add List<List<RestorableTsFileIOWriter&gt;&gt; vmWriters for management
* in the recover process, the recovery of the vm file is newly added, and the corresponding TsFileProcessor is injected after the recovery

The compaction strategy is now writen like LeveledCompactionStrategy in&nbsp;Cassandra, and it can be&nbsp;optimized later.&nbsp;


I put the detail zh-doc in the attachment.

Thanks,
--
Lingzhe Zhang
School of Software, Tsinghua University

张凌哲
清华大学 软件学院

Re: add vm(hot compaction) in tsfile processor

Posted by Jialin Qiao <qj...@mails.tsinghua.edu.cn>.
Hi,


Thanks Lingzhe, this feature could improve the query performance a lot. max_vm_num limits the max number of vm in each level.


The max_vm_num is 10 by default and the max_merge_chunk_num_in_tsfile is 100 now. 


Besides, I can't see you figures attached... 



Thanks,
--
Jialin Qiao
School of Software, Tsinghua University

乔嘉林
清华大学 软件学院

-----原始邮件-----
发件人:445073309 <su...@foxmail.com>
发送时间:2020-07-20 15:30:46 (星期一)
收件人: dev <de...@iotdb.apache.org>
抄送:
主题: Re: add vm(hot compaction) in tsfile processor


Hi,

max_vm_num means that the most number of vm files relation to a tsfile's level.
for example, we set max_vm_num=5 and we flush 11 times, then the compaction procedure can be described as below:
* when we flush 5(max_vm_num) times, the current level will do compaction to the next level
* when we flush all 11 times, the compaction procedure is
* if we close the tsfile, the whole compaction procedure will be


And we set default max_vm_num=5 in current version, if user do not know which value is suitable, just use the default value is enough to make chunk larger.
Best,
-----------------------------------
Lingzhe Zhang
School of Software, Tsinghua University

张凌哲
清华大学 软件学院




------------------ 原始邮件 ------------------
发件人: "dev" <sa...@gmail.com>;
发送时间: 2020年7月20日(星期一) 下午3:10
收件人: "dev"<de...@iotdb.apache.org>;
主题: Re: add vm(hot compaction) in tsfile processor


Hi Lingzhe,

>max_vm_num: indicates that a TsFileProcessor has at most the number of
virtual memory files

what does this mean? and how do I know what value is suitable? (For
example, if I set it as 1, is there any impact?)

Best,
-----------------------------------
Xiangdong Huang
School of Software, Tsinghua University

 黄向东
清华大学 软件学院


445073309 <su...@foxmail.com> 于2020年7月20日周一 下午12:42写道:

> Hi,
>
>
> I met a problem that iotdb will write small chunk data when lack of
> memtable num, this causes the system to query hot data more slowly.
>
>
> So I create a new type of file -- vm file, and use it to do hot compaction
> in flush processor. With this, we can flexiblily controll the size of each
> chunk. The configuration and usage changes will be described as follows:
> * add a new parameter enable_vm in iotdb-engine.properties: indicates
> whether to use virtual memory
> * use parameter avg_series_point_number_threshold in iotdb-engine.properties:
> indicates the minimum average number of chunk data points after hot
> compaction
> * add a new parameter max_vm_num in iotdb-engine.properties:
> indicates that a TsFileProcessor has at most the number of virtual memory
> files
> * add a new parameter max_merge_chunk_num_in_tsfile in iotdb-engine.properties:
> indicates the vm files max merge times
> * the suffix of the vm file is'.vm', and the naming convention is
> {tsfile_name}-{level}-{timestamp}.vm
>
> And there are many detail changes like:
> * set virtual memory file list List<List<TsFileResource>>
> vmTsFileResources for each TsFileProcessor, add
> List<List<RestorableTsFileIOWriter>> vmWriters for management
> * in the recover process, the recovery of the vm file is newly added, and
> the corresponding TsFileProcessor is injected after the recovery
>
> The compaction strategy is now writen like LeveledCompactionStrategy in Cassandra,
> and it can be optimized later.
>
> I put the detail zh-doc in the attachment.
>
> Thanks,
> --
> Lingzhe Zhang
> School of Software, Tsinghua University
>
> 张凌哲
> 清华大学 软件学院
>


回复: add vm(hot compaction) in tsfile processor

Posted by 445073309 <su...@foxmail.com>.
Hi,


OK,I will do not use any rich-format


&gt;Suppose the parameter is 5. Then in level 2, will you merge 4 new VM files
&gt;to the bigger one, or merge 5 VM files?
I will merge 5 VM files to a bigger one in level 2.





------------------&nbsp;原始邮件&nbsp;------------------
发件人:                                                                                                                        "dev"                                                                                    <sainthxd@gmail.com&gt;;
发送时间:&nbsp;2020年7月20日(星期一) 下午4:03
收件人:&nbsp;"dev"<dev@iotdb.apache.org&gt;;

主题:&nbsp;Re: add vm(hot compaction) in tsfile processor



Hi Lingzhe,

Suggest you give up your email client...

Or, do not use any rich-format in the mailing list.

Best,
-----------------------------------
Xiangdong Huang
School of Software, Tsinghua University

&nbsp;黄向东
清华大学 软件学院


445073309 <surevil@foxmail.com&gt; 于2020年7月20日周一 下午3:47写道:

&gt; Hi,
&gt;
&gt;
&gt; I convert figure to symbolic..
&gt; &amp;gt; * when we flush 5(max_vm_num) times, the current level will do
&gt; compaction
&gt; &amp;gt; to the next level
&gt; 1 1 1 1 1
&gt; |&amp;nbsp; / / / /
&gt; 5
&gt; &amp;gt; * when we flush all 11 times, the compaction procedure is
&gt; 1 1 1 1 1&amp;nbsp; &amp;nbsp;1 1 1 1 1&amp;nbsp; &amp;nbsp;1
&gt; |&amp;nbsp; / / / /&amp;nbsp; &amp;nbsp; &amp;nbsp;|&amp;nbsp; / / / /
&gt; 5&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;5
&gt; &amp;gt; * if we close the tsfile, the whole compaction procedure will be
&gt; 1 1 1 1 1&amp;nbsp; &amp;nbsp;1 1 1 1 1&amp;nbsp; &amp;nbsp;1
&gt; |&amp;nbsp; / / / /&amp;nbsp; &amp;nbsp; &amp;nbsp;|&amp;nbsp; / / / /&amp;nbsp; &amp;nbsp; /
&gt; 5&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;5&amp;nbsp; &amp;nbsp;
&gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;/
&gt;
&gt; |&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; /&amp;nbsp; &amp;nbsp;
&gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;/
&gt; 11
&gt;
&gt;
&gt; &amp;gt; Suppose the parameter is 5. Then in level 2, will you merge 4 new VM
&gt; files
&gt; &amp;gt; to the bigger one, or merge 5 VM files?
&gt; I will merge 5 VM files to a bigger one&amp;nbsp;in level 2.
&gt;
&gt;
&gt; -----------------------------------
&gt; Lingzhe Zhang
&gt; School of Software, Tsinghua University
&gt;
&gt; 张凌哲
&gt; 清华大学 软件学院
&gt; ------------------&amp;nbsp;原始邮件&amp;nbsp;------------------
&gt; 发件人:
&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; "dev"
&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <
&gt; sainthxd@gmail.com&amp;gt;;
&gt; 发送时间:&amp;nbsp;2020年7月20日(星期一) 下午3:39
&gt; 收件人:&amp;nbsp;"dev"<dev@iotdb.apache.org&amp;gt;;
&gt;
&gt; 主题:&amp;nbsp;Re: add vm(hot compaction) in tsfile processor
&gt;
&gt;
&gt;
&gt; Hi,
&gt;
&gt; Did you attach some figures? The mailing list does not allow figures..
&gt;
&gt; Suppose the parameter is 5. Then in level 2, will you merge 4 new VM files
&gt; to the bigger one, or merge 5 VM files?
&gt;
&gt; Best,
&gt; -----------------------------------
&gt; Xiangdong Huang
&gt; School of Software, Tsinghua University
&gt;
&gt; &amp;nbsp;黄向东
&gt; 清华大学 软件学院
&gt;
&gt;
&gt; 445073309 <surevil@foxmail.com&amp;gt; 于2020年7月20日周一 下午3:31写道:
&gt;
&gt; &amp;gt; Hi,
&gt; &amp;gt;
&gt; &amp;gt; max_vm_num means that the most number of vm files relation to a
&gt; tsfile's
&gt; &amp;gt; level.
&gt; &amp;gt; for example, we set max_vm_num=5 and we flush 11 times, then the
&gt; &amp;gt; compaction procedure can be described as below:
&gt; &amp;gt; * when we flush 5(max_vm_num) times, the current level will do
&gt; compaction
&gt; &amp;gt; to the next level
&gt; &amp;gt; * when we flush all 11 times, the compaction procedure is
&gt; &amp;gt; * if we close the tsfile, the whole compaction procedure will be
&gt; &amp;gt;
&gt; &amp;gt;
&gt; &amp;gt; And we set default max_vm_num=5 in current version, if user do not
&gt; know
&gt; &amp;gt; which value is suitable, just use the default value is enough to make
&gt; chunk
&gt; &amp;gt; larger.
&gt; &amp;gt; Best,
&gt; &amp;gt; -----------------------------------
&gt; &amp;gt; Lingzhe Zhang
&gt; &amp;gt; School of Software, Tsinghua University
&gt; &amp;gt;
&gt; &amp;gt; 张凌哲
&gt; &amp;gt; 清华大学 软件学院
&gt; &amp;gt;
&gt; &amp;gt;
&gt; &amp;gt; ------------------ 原始邮件 ------------------
&gt; &amp;gt; *发件人:* "dev" <sainthxd@gmail.com&amp;gt;;
&gt; &amp;gt; *发送时间:* 2020年7月20日(星期一) 下午3:10
&gt; &amp;gt; *收件人:* "dev"<dev@iotdb.apache.org&amp;gt;;
&gt; &amp;gt; *主题:* Re: add vm(hot compaction) in tsfile processor
&gt; &amp;gt;
&gt; &amp;gt; Hi Lingzhe,
&gt; &amp;gt;
&gt; &amp;gt; &amp;gt;max_vm_num: indicates that a TsFileProcessor has at most the
&gt; number of
&gt; &amp;gt; virtual memory files
&gt; &amp;gt;
&gt; &amp;gt; what does this mean? and how do I know what value is suitable? (For
&gt; &amp;gt; example, if I set it as 1, is there any impact?)
&gt; &amp;gt;
&gt; &amp;gt; Best,
&gt; &amp;gt; -----------------------------------
&gt; &amp;gt; Xiangdong Huang
&gt; &amp;gt; School of Software, Tsinghua University
&gt; &amp;gt;
&gt; &amp;gt;&amp;nbsp; 黄向东
&gt; &amp;gt; 清华大学 软件学院
&gt; &amp;gt;
&gt; &amp;gt;
&gt; &amp;gt; 445073309 <surevil@foxmail.com&amp;gt; 于2020年7月20日周一 下午12:42写道:
&gt; &amp;gt;
&gt; &amp;gt; &amp;gt; Hi,
&gt; &amp;gt; &amp;gt;
&gt; &amp;gt; &amp;gt;
&gt; &amp;gt; &amp;gt; I met a problem that iotdb will write small chunk data when lack
&gt; of
&gt; &amp;gt; &amp;gt; memtable num, this causes the system to query hot data more
&gt; slowly.
&gt; &amp;gt; &amp;gt;
&gt; &amp;gt; &amp;gt;
&gt; &amp;gt; &amp;gt; So I create a new type of file -- vm file, and use it to do hot
&gt; &amp;gt; compaction
&gt; &amp;gt; &amp;gt; in flush processor. With this, we can flexiblily controll the
&gt; size of
&gt; &amp;gt; each
&gt; &amp;gt; &amp;gt; chunk. The configuration and usage changes will be described as
&gt; follows:
&gt; &amp;gt; &amp;gt; * add a new parameter enable_vm in iotdb-engine.properties:
&gt; indicates
&gt; &amp;gt; &amp;gt; whether to use virtual memory
&gt; &amp;gt; &amp;gt; * use parameter avg_series_point_number_threshold in
&gt; &amp;gt; iotdb-engine.properties:
&gt; &amp;gt; &amp;gt; indicates the minimum average number of chunk data points after
&gt; hot
&gt; &amp;gt; &amp;gt; compaction
&gt; &amp;gt; &amp;gt; * add a new parameter max_vm_num in iotdb-engine.properties:
&gt; &amp;gt; &amp;gt; indicates that a TsFileProcessor has at most the number of
&gt; virtual memory
&gt; &amp;gt; &amp;gt; files
&gt; &amp;gt; &amp;gt; * add a new parameter max_merge_chunk_num_in_tsfile in
&gt; &amp;gt; iotdb-engine.properties:
&gt; &amp;gt; &amp;gt; indicates the vm files max merge times
&gt; &amp;gt; &amp;gt; * the suffix of the vm file is'.vm', and the naming convention is
&gt; &amp;gt; &amp;gt; {tsfile_name}-{level}-{timestamp}.vm
&gt; &amp;gt; &amp;gt;
&gt; &amp;gt; &amp;gt; And there are many detail changes like:
&gt; &amp;gt; &amp;gt; * set virtual memory file list List<List<TsFileResource&amp;gt;&amp;gt;
&gt; &amp;gt; &amp;gt; vmTsFileResources for each TsFileProcessor, add
&gt; &amp;gt; &amp;gt; List<List<RestorableTsFileIOWriter&amp;gt;&amp;gt; vmWriters for
&gt; management
&gt; &amp;gt; &amp;gt; * in the recover process, the recovery of the vm file is newly
&gt; added, and
&gt; &amp;gt; &amp;gt; the corresponding TsFileProcessor is injected after the recovery
&gt; &amp;gt; &amp;gt;
&gt; &amp;gt; &amp;gt; The compaction strategy is now writen like
&gt; LeveledCompactionStrategy in
&gt; &amp;gt; Cassandra,
&gt; &amp;gt; &amp;gt; and it can be optimized later.
&gt; &amp;gt; &amp;gt;
&gt; &amp;gt; &amp;gt; I put the detail zh-doc in the attachment.
&gt; &amp;gt; &amp;gt;
&gt; &amp;gt; &amp;gt; Thanks,
&gt; &amp;gt; &amp;gt; --
&gt; &amp;gt; &amp;gt; Lingzhe Zhang
&gt; &amp;gt; &amp;gt; School of Software, Tsinghua University
&gt; &amp;gt; &amp;gt;
&gt; &amp;gt; &amp;gt; 张凌哲
&gt; &amp;gt; &amp;gt; 清华大学 软件学院
&gt; &amp;gt; &amp;gt;
&gt; &amp;gt;
&gt; &amp;gt;

Re: add vm(hot compaction) in tsfile processor

Posted by Xiangdong Huang <sa...@gmail.com>.
Hi Lingzhe,

Suggest you give up your email client...

Or, do not use any rich-format in the mailing list.

Best,
-----------------------------------
Xiangdong Huang
School of Software, Tsinghua University

 黄向东
清华大学 软件学院


445073309 <su...@foxmail.com> 于2020年7月20日周一 下午3:47写道:

> Hi,
>
>
> I convert figure to symbolic..
> &gt; * when we flush 5(max_vm_num) times, the current level will do
> compaction
> &gt; to the next level
> 1 1 1 1 1
> |&nbsp; / / / /
> 5
> &gt; * when we flush all 11 times, the compaction procedure is
> 1 1 1 1 1&nbsp; &nbsp;1 1 1 1 1&nbsp; &nbsp;1
> |&nbsp; / / / /&nbsp; &nbsp; &nbsp;|&nbsp; / / / /
> 5&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;5
> &gt; * if we close the tsfile, the whole compaction procedure will be
> 1 1 1 1 1&nbsp; &nbsp;1 1 1 1 1&nbsp; &nbsp;1
> |&nbsp; / / / /&nbsp; &nbsp; &nbsp;|&nbsp; / / / /&nbsp; &nbsp; /
> 5&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;5&nbsp; &nbsp;
> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;/
>
> |&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; /&nbsp; &nbsp;
> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;/
> 11
>
>
> &gt; Suppose the parameter is 5. Then in level 2, will you merge 4 new VM
> files
> &gt; to the bigger one, or merge 5 VM files?
> I will merge 5 VM files to a bigger one&nbsp;in level 2.
>
>
> -----------------------------------
> Lingzhe Zhang
> School of Software, Tsinghua University
>
> 张凌哲
> 清华大学 软件学院
> ------------------&nbsp;原始邮件&nbsp;------------------
> 发件人:
>                                                   "dev"
>                                                                 <
> sainthxd@gmail.com&gt;;
> 发送时间:&nbsp;2020年7月20日(星期一) 下午3:39
> 收件人:&nbsp;"dev"<dev@iotdb.apache.org&gt;;
>
> 主题:&nbsp;Re: add vm(hot compaction) in tsfile processor
>
>
>
> Hi,
>
> Did you attach some figures? The mailing list does not allow figures..
>
> Suppose the parameter is 5. Then in level 2, will you merge 4 new VM files
> to the bigger one, or merge 5 VM files?
>
> Best,
> -----------------------------------
> Xiangdong Huang
> School of Software, Tsinghua University
>
> &nbsp;黄向东
> 清华大学 软件学院
>
>
> 445073309 <surevil@foxmail.com&gt; 于2020年7月20日周一 下午3:31写道:
>
> &gt; Hi,
> &gt;
> &gt; max_vm_num means that the most number of vm files relation to a
> tsfile's
> &gt; level.
> &gt; for example, we set max_vm_num=5 and we flush 11 times, then the
> &gt; compaction procedure can be described as below:
> &gt; * when we flush 5(max_vm_num) times, the current level will do
> compaction
> &gt; to the next level
> &gt; * when we flush all 11 times, the compaction procedure is
> &gt; * if we close the tsfile, the whole compaction procedure will be
> &gt;
> &gt;
> &gt; And we set default max_vm_num=5 in current version, if user do not
> know
> &gt; which value is suitable, just use the default value is enough to make
> chunk
> &gt; larger.
> &gt; Best,
> &gt; -----------------------------------
> &gt; Lingzhe Zhang
> &gt; School of Software, Tsinghua University
> &gt;
> &gt; 张凌哲
> &gt; 清华大学 软件学院
> &gt;
> &gt;
> &gt; ------------------ 原始邮件 ------------------
> &gt; *发件人:* "dev" <sainthxd@gmail.com&gt;;
> &gt; *发送时间:* 2020年7月20日(星期一) 下午3:10
> &gt; *收件人:* "dev"<dev@iotdb.apache.org&gt;;
> &gt; *主题:* Re: add vm(hot compaction) in tsfile processor
> &gt;
> &gt; Hi Lingzhe,
> &gt;
> &gt; &gt;max_vm_num: indicates that a TsFileProcessor has at most the
> number of
> &gt; virtual memory files
> &gt;
> &gt; what does this mean? and how do I know what value is suitable? (For
> &gt; example, if I set it as 1, is there any impact?)
> &gt;
> &gt; Best,
> &gt; -----------------------------------
> &gt; Xiangdong Huang
> &gt; School of Software, Tsinghua University
> &gt;
> &gt;&nbsp; 黄向东
> &gt; 清华大学 软件学院
> &gt;
> &gt;
> &gt; 445073309 <surevil@foxmail.com&gt; 于2020年7月20日周一 下午12:42写道:
> &gt;
> &gt; &gt; Hi,
> &gt; &gt;
> &gt; &gt;
> &gt; &gt; I met a problem that iotdb will write small chunk data when lack
> of
> &gt; &gt; memtable num, this causes the system to query hot data more
> slowly.
> &gt; &gt;
> &gt; &gt;
> &gt; &gt; So I create a new type of file -- vm file, and use it to do hot
> &gt; compaction
> &gt; &gt; in flush processor. With this, we can flexiblily controll the
> size of
> &gt; each
> &gt; &gt; chunk. The configuration and usage changes will be described as
> follows:
> &gt; &gt; * add a new parameter enable_vm in iotdb-engine.properties:
> indicates
> &gt; &gt; whether to use virtual memory
> &gt; &gt; * use parameter avg_series_point_number_threshold in
> &gt; iotdb-engine.properties:
> &gt; &gt; indicates the minimum average number of chunk data points after
> hot
> &gt; &gt; compaction
> &gt; &gt; * add a new parameter max_vm_num in iotdb-engine.properties:
> &gt; &gt; indicates that a TsFileProcessor has at most the number of
> virtual memory
> &gt; &gt; files
> &gt; &gt; * add a new parameter max_merge_chunk_num_in_tsfile in
> &gt; iotdb-engine.properties:
> &gt; &gt; indicates the vm files max merge times
> &gt; &gt; * the suffix of the vm file is'.vm', and the naming convention is
> &gt; &gt; {tsfile_name}-{level}-{timestamp}.vm
> &gt; &gt;
> &gt; &gt; And there are many detail changes like:
> &gt; &gt; * set virtual memory file list List<List<TsFileResource&gt;&gt;
> &gt; &gt; vmTsFileResources for each TsFileProcessor, add
> &gt; &gt; List<List<RestorableTsFileIOWriter&gt;&gt; vmWriters for
> management
> &gt; &gt; * in the recover process, the recovery of the vm file is newly
> added, and
> &gt; &gt; the corresponding TsFileProcessor is injected after the recovery
> &gt; &gt;
> &gt; &gt; The compaction strategy is now writen like
> LeveledCompactionStrategy in
> &gt; Cassandra,
> &gt; &gt; and it can be optimized later.
> &gt; &gt;
> &gt; &gt; I put the detail zh-doc in the attachment.
> &gt; &gt;
> &gt; &gt; Thanks,
> &gt; &gt; --
> &gt; &gt; Lingzhe Zhang
> &gt; &gt; School of Software, Tsinghua University
> &gt; &gt;
> &gt; &gt; 张凌哲
> &gt; &gt; 清华大学 软件学院
> &gt; &gt;
> &gt;
> &gt;

Re: add vm(hot compaction) in tsfile processor

Posted by 445073309 <su...@foxmail.com>.
Hi,


I convert figure to symbolic..
&gt; * when we flush 5(max_vm_num) times, the current level will do compaction
&gt; to the next level
1 1 1 1 1
|&nbsp; / / / /
5
&gt; * when we flush all 11 times, the compaction procedure is
1 1 1 1 1&nbsp; &nbsp;1 1 1 1 1&nbsp; &nbsp;1
|&nbsp; / / / /&nbsp; &nbsp; &nbsp;|&nbsp; / / / /
5&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;5
&gt; * if we close the tsfile, the whole compaction procedure will be
1 1 1 1 1&nbsp; &nbsp;1 1 1 1 1&nbsp; &nbsp;1
|&nbsp; / / / /&nbsp; &nbsp; &nbsp;|&nbsp; / / / /&nbsp; &nbsp; /
5&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;5&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;/

|&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; /&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;/
11


&gt; Suppose the parameter is 5. Then in level 2, will you merge 4 new VM files
&gt; to the bigger one, or merge 5 VM files?
I will merge 5 VM files to a bigger one&nbsp;in level 2.


-----------------------------------
Lingzhe Zhang
School of Software, Tsinghua University

张凌哲
清华大学 软件学院
------------------&nbsp;原始邮件&nbsp;------------------
发件人:                                                                                                                        "dev"                                                                                    <sainthxd@gmail.com&gt;;
发送时间:&nbsp;2020年7月20日(星期一) 下午3:39
收件人:&nbsp;"dev"<dev@iotdb.apache.org&gt;;

主题:&nbsp;Re: add vm(hot compaction) in tsfile processor



Hi,

Did you attach some figures? The mailing list does not allow figures..

Suppose the parameter is 5. Then in level 2, will you merge 4 new VM files
to the bigger one, or merge 5 VM files?

Best,
-----------------------------------
Xiangdong Huang
School of Software, Tsinghua University

&nbsp;黄向东
清华大学 软件学院


445073309 <surevil@foxmail.com&gt; 于2020年7月20日周一 下午3:31写道:

&gt; Hi,
&gt;
&gt; max_vm_num means that the most number of vm files relation to a tsfile's
&gt; level.
&gt; for example, we set max_vm_num=5 and we flush 11 times, then the
&gt; compaction procedure can be described as below:
&gt; * when we flush 5(max_vm_num) times, the current level will do compaction
&gt; to the next level
&gt; * when we flush all 11 times, the compaction procedure is
&gt; * if we close the tsfile, the whole compaction procedure will be
&gt;
&gt;
&gt; And we set default max_vm_num=5 in current version, if user do not know
&gt; which value is suitable, just use the default value is enough to make chunk
&gt; larger.
&gt; Best,
&gt; -----------------------------------
&gt; Lingzhe Zhang
&gt; School of Software, Tsinghua University
&gt;
&gt; 张凌哲
&gt; 清华大学 软件学院
&gt;
&gt;
&gt; ------------------ 原始邮件 ------------------
&gt; *发件人:* "dev" <sainthxd@gmail.com&gt;;
&gt; *发送时间:* 2020年7月20日(星期一) 下午3:10
&gt; *收件人:* "dev"<dev@iotdb.apache.org&gt;;
&gt; *主题:* Re: add vm(hot compaction) in tsfile processor
&gt;
&gt; Hi Lingzhe,
&gt;
&gt; &gt;max_vm_num: indicates that a TsFileProcessor has at most the number of
&gt; virtual memory files
&gt;
&gt; what does this mean? and how do I know what value is suitable? (For
&gt; example, if I set it as 1, is there any impact?)
&gt;
&gt; Best,
&gt; -----------------------------------
&gt; Xiangdong Huang
&gt; School of Software, Tsinghua University
&gt;
&gt;&nbsp; 黄向东
&gt; 清华大学 软件学院
&gt;
&gt;
&gt; 445073309 <surevil@foxmail.com&gt; 于2020年7月20日周一 下午12:42写道:
&gt;
&gt; &gt; Hi,
&gt; &gt;
&gt; &gt;
&gt; &gt; I met a problem that iotdb will write small chunk data when lack of
&gt; &gt; memtable num, this causes the system to query hot data more slowly.
&gt; &gt;
&gt; &gt;
&gt; &gt; So I create a new type of file -- vm file, and use it to do hot
&gt; compaction
&gt; &gt; in flush processor. With this, we can flexiblily controll the size of
&gt; each
&gt; &gt; chunk. The configuration and usage changes will be described as follows:
&gt; &gt; * add a new parameter enable_vm in iotdb-engine.properties: indicates
&gt; &gt; whether to use virtual memory
&gt; &gt; * use parameter avg_series_point_number_threshold in
&gt; iotdb-engine.properties:
&gt; &gt; indicates the minimum average number of chunk data points after hot
&gt; &gt; compaction
&gt; &gt; * add a new parameter max_vm_num in iotdb-engine.properties:
&gt; &gt; indicates that a TsFileProcessor has at most the number of virtual memory
&gt; &gt; files
&gt; &gt; * add a new parameter max_merge_chunk_num_in_tsfile in
&gt; iotdb-engine.properties:
&gt; &gt; indicates the vm files max merge times
&gt; &gt; * the suffix of the vm file is'.vm', and the naming convention is
&gt; &gt; {tsfile_name}-{level}-{timestamp}.vm
&gt; &gt;
&gt; &gt; And there are many detail changes like:
&gt; &gt; * set virtual memory file list List<List<TsFileResource&gt;&gt;
&gt; &gt; vmTsFileResources for each TsFileProcessor, add
&gt; &gt; List<List<RestorableTsFileIOWriter&gt;&gt; vmWriters for management
&gt; &gt; * in the recover process, the recovery of the vm file is newly added, and
&gt; &gt; the corresponding TsFileProcessor is injected after the recovery
&gt; &gt;
&gt; &gt; The compaction strategy is now writen like LeveledCompactionStrategy in
&gt; Cassandra,
&gt; &gt; and it can be optimized later.
&gt; &gt;
&gt; &gt; I put the detail zh-doc in the attachment.
&gt; &gt;
&gt; &gt; Thanks,
&gt; &gt; --
&gt; &gt; Lingzhe Zhang
&gt; &gt; School of Software, Tsinghua University
&gt; &gt;
&gt; &gt; 张凌哲
&gt; &gt; 清华大学 软件学院
&gt; &gt;
&gt;
&gt;

Re: add vm(hot compaction) in tsfile processor

Posted by Xiangdong Huang <sa...@gmail.com>.
Hi,

Did you attach some figures? The mailing list does not allow figures..

Suppose the parameter is 5. Then in level 2, will you merge 4 new VM files
to the bigger one, or merge 5 VM files?

Best,
-----------------------------------
Xiangdong Huang
School of Software, Tsinghua University

 黄向东
清华大学 软件学院


445073309 <su...@foxmail.com> 于2020年7月20日周一 下午3:31写道:

> Hi,
>
> max_vm_num means that the most number of vm files relation to a tsfile's
> level.
> for example, we set max_vm_num=5 and we flush 11 times, then the
> compaction procedure can be described as below:
> * when we flush 5(max_vm_num) times, the current level will do compaction
> to the next level
> * when we flush all 11 times, the compaction procedure is
> * if we close the tsfile, the whole compaction procedure will be
>
>
> And we set default max_vm_num=5 in current version, if user do not know
> which value is suitable, just use the default value is enough to make chunk
> larger.
> Best,
> -----------------------------------
> Lingzhe Zhang
> School of Software, Tsinghua University
>
> 张凌哲
> 清华大学 软件学院
>
>
> ------------------ 原始邮件 ------------------
> *发件人:* "dev" <sa...@gmail.com>;
> *发送时间:* 2020年7月20日(星期一) 下午3:10
> *收件人:* "dev"<de...@iotdb.apache.org>;
> *主题:* Re: add vm(hot compaction) in tsfile processor
>
> Hi Lingzhe,
>
> >max_vm_num: indicates that a TsFileProcessor has at most the number of
> virtual memory files
>
> what does this mean? and how do I know what value is suitable? (For
> example, if I set it as 1, is there any impact?)
>
> Best,
> -----------------------------------
> Xiangdong Huang
> School of Software, Tsinghua University
>
>  黄向东
> 清华大学 软件学院
>
>
> 445073309 <su...@foxmail.com> 于2020年7月20日周一 下午12:42写道:
>
> > Hi,
> >
> >
> > I met a problem that iotdb will write small chunk data when lack of
> > memtable num, this causes the system to query hot data more slowly.
> >
> >
> > So I create a new type of file -- vm file, and use it to do hot
> compaction
> > in flush processor. With this, we can flexiblily controll the size of
> each
> > chunk. The configuration and usage changes will be described as follows:
> > * add a new parameter enable_vm in iotdb-engine.properties: indicates
> > whether to use virtual memory
> > * use parameter avg_series_point_number_threshold in
> iotdb-engine.properties:
> > indicates the minimum average number of chunk data points after hot
> > compaction
> > * add a new parameter max_vm_num in iotdb-engine.properties:
> > indicates that a TsFileProcessor has at most the number of virtual memory
> > files
> > * add a new parameter max_merge_chunk_num_in_tsfile in
> iotdb-engine.properties:
> > indicates the vm files max merge times
> > * the suffix of the vm file is'.vm', and the naming convention is
> > {tsfile_name}-{level}-{timestamp}.vm
> >
> > And there are many detail changes like:
> > * set virtual memory file list List<List<TsFileResource>>
> > vmTsFileResources for each TsFileProcessor, add
> > List<List<RestorableTsFileIOWriter>> vmWriters for management
> > * in the recover process, the recovery of the vm file is newly added, and
> > the corresponding TsFileProcessor is injected after the recovery
> >
> > The compaction strategy is now writen like LeveledCompactionStrategy in
> Cassandra,
> > and it can be optimized later.
> >
> > I put the detail zh-doc in the attachment.
> >
> > Thanks,
> > --
> > Lingzhe Zhang
> > School of Software, Tsinghua University
> >
> > 张凌哲
> > 清华大学 软件学院
> >
>
>

Re: add vm(hot compaction) in tsfile processor

Posted by 445073309 <su...@foxmail.com>.
Hi,

max_vm_num means that the most number of vm files relation to a tsfile's level.
for example, we set max_vm_num=5 and we flush 11 times, then the compaction procedure can be described as below:
* when we flush 5(max_vm_num)&nbsp;times, the current level will do compaction to the next level

* when we flush all 11&nbsp;times, the compaction&nbsp;procedure is

* if we close the tsfile, the whole&nbsp;compaction&nbsp;procedure will be


And we set default&nbsp;max_vm_num=5 in current version, if user do not know which value is suitable, just use the default value is enough to make chunk larger.
Best,
-----------------------------------
Lingzhe Zhang
School of Software, Tsinghua University

张凌哲
清华大学 软件学院




------------------&nbsp;原始邮件&nbsp;------------------
发件人:                                                                                                                        "dev"                                                                                    <sainthxd@gmail.com&gt;;
发送时间:&nbsp;2020年7月20日(星期一) 下午3:10
收件人:&nbsp;"dev"<dev@iotdb.apache.org&gt;;

主题:&nbsp;Re: add vm(hot compaction) in tsfile processor



Hi Lingzhe,

&gt;max_vm_num: indicates that a TsFileProcessor has at most the number of
virtual memory files

what does this mean? and how do I know what value is suitable? (For
example, if I set it as 1, is there any impact?)

Best,
-----------------------------------
Xiangdong Huang
School of Software, Tsinghua University

&nbsp;黄向东
清华大学 软件学院


445073309 <surevil@foxmail.com&gt; 于2020年7月20日周一 下午12:42写道:

&gt; Hi,
&gt;
&gt;
&gt; I met a problem that iotdb will write small chunk data when lack of
&gt; memtable num, this causes the system to query hot data more slowly.
&gt;
&gt;
&gt; So I create a new type of file -- vm file, and use it to do hot compaction
&gt; in flush processor. With this, we can flexiblily controll the size of each
&gt; chunk. The configuration and usage changes will be described as follows:
&gt; * add a new parameter enable_vm in iotdb-engine.properties: indicates
&gt; whether to use virtual memory
&gt; * use parameter avg_series_point_number_threshold in iotdb-engine.properties:
&gt; indicates the minimum average number of chunk data points after hot
&gt; compaction
&gt; * add a new parameter max_vm_num in iotdb-engine.properties:
&gt; indicates that a TsFileProcessor has at most the number of virtual memory
&gt; files
&gt; * add a new parameter max_merge_chunk_num_in_tsfile in iotdb-engine.properties:
&gt; indicates the vm files max merge times
&gt; * the suffix of the vm file is'.vm', and the naming convention is
&gt; {tsfile_name}-{level}-{timestamp}.vm
&gt;
&gt; And there are many detail changes like:
&gt; * set virtual memory file list List<List<TsFileResource&gt;&gt;
&gt; vmTsFileResources for each TsFileProcessor, add
&gt; List<List<RestorableTsFileIOWriter&gt;&gt; vmWriters for management
&gt; * in the recover process, the recovery of the vm file is newly added, and
&gt; the corresponding TsFileProcessor is injected after the recovery
&gt;
&gt; The compaction strategy is now writen like LeveledCompactionStrategy in Cassandra,
&gt; and it can be optimized later.
&gt;
&gt; I put the detail zh-doc in the attachment.
&gt;
&gt; Thanks,
&gt; --
&gt; Lingzhe Zhang
&gt; School of Software, Tsinghua University
&gt;
&gt; 张凌哲
&gt; 清华大学 软件学院
&gt;

Re: add vm(hot compaction) in tsfile processor

Posted by Xiangdong Huang <sa...@gmail.com>.
Hi Lingzhe,

>max_vm_num: indicates that a TsFileProcessor has at most the number of
virtual memory files

what does this mean? and how do I know what value is suitable? (For
example, if I set it as 1, is there any impact?)

Best,
-----------------------------------
Xiangdong Huang
School of Software, Tsinghua University

 黄向东
清华大学 软件学院


445073309 <su...@foxmail.com> 于2020年7月20日周一 下午12:42写道:

> Hi,
>
>
> I met a problem that iotdb will write small chunk data when lack of
> memtable num, this causes the system to query hot data more slowly.
>
>
> So I create a new type of file -- vm file, and use it to do hot compaction
> in flush processor. With this, we can flexiblily controll the size of each
> chunk. The configuration and usage changes will be described as follows:
> * add a new parameter enable_vm in iotdb-engine.properties: indicates
> whether to use virtual memory
> * use parameter avg_series_point_number_threshold in iotdb-engine.properties:
> indicates the minimum average number of chunk data points after hot
> compaction
> * add a new parameter max_vm_num in iotdb-engine.properties:
> indicates that a TsFileProcessor has at most the number of virtual memory
> files
> * add a new parameter max_merge_chunk_num_in_tsfile in iotdb-engine.properties:
> indicates the vm files max merge times
> * the suffix of the vm file is'.vm', and the naming convention is
> {tsfile_name}-{level}-{timestamp}.vm
>
> And there are many detail changes like:
> * set virtual memory file list List<List<TsFileResource>>
> vmTsFileResources for each TsFileProcessor, add
> List<List<RestorableTsFileIOWriter>> vmWriters for management
> * in the recover process, the recovery of the vm file is newly added, and
> the corresponding TsFileProcessor is injected after the recovery
>
> The compaction strategy is now writen like LeveledCompactionStrategy in Cassandra,
> and it can be optimized later.
>
> I put the detail zh-doc in the attachment.
>
> Thanks,
> --
> Lingzhe Zhang
> School of Software, Tsinghua University
>
> 张凌哲
> 清华大学 软件学院
>