You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@iotdb.apache.org by 江天 <jt...@163.com> on 2019/07/08 06:51:16 UTC

About the design and development of merge.

Greetings,

Although the new storage engine is on-line, some old functionalities are lost, for example, merge. As you may have known, current storage engine consists of two kind of data files, sequential files and unsequential files. A data point of a specified timestamp, say, t0, may occur zero or one time in sequential files and zero or many times in unsequential files. During a query, data with the same timestamp are all read from the files and only the newest version (the latest inserted one) is returned as a result. This indicates that the out-dated data in old files may down-grade the query performance. Besides, keeping data in different files incurs more disk seeks in a query, which significantly hurts the performance.

To avoid those disadvantages of keeping data disorderly in different files, we introduce a process called merge (also called compaction in other LSM systems) to read and rewrite data in multiple time-overlapping files to a new file which preserves better time order and contains no duplicated data.

Providing an efficient way to make data more compact is no easy task. If you feel interested or have learned compaction in some LSM systems, please join the discussion in this thread and give us your precious advices.

Many thanks,

Tian Jiang

Re: About the design and development of merge.

Posted by 江天 <jt...@163.com>.

Hi,

I have done some simple performance tests on merge and found what makes up its main time consumption, here is the result:

query unseq data: 26.8%(next) + 6.1%(query metadata) + 18.3%(init reader)




query seq data: 20%(next) + 10%(metadata)



write chunk in memory chunk: 9.7%


Clearly, writing is not a bottleneck when doing a merge and the query on unseq data takes significant more time than on seq data (in this experiment, seq data has the same amount as unseq data), overall, the time spent on reading unseq data, reading seq data, wrting merged data is 5 : 3 : 1. In order to speed up merge, speeding up querying is the first concern.



> 在 2019年7月18日，上午9:02，江天 <jt...@163.com> 写道：
> 
> Hi,
> 
> The Merge is functional on branch dev_merge and the related PR is on https://github.com/apache/incubator-iotdb/pull/258 <https://github.com/apache/incubator-iotdb/pull/258>. If you feel interested, feel free to give any advices or comments.
> 
> Thanks
> 
> Tian Jiang

Re: About the design and development of merge.

Posted by 江天 <jt...@163.com>.

Hi,

The Merge is functional on branch dev_merge and the related PR is on https://github.com/apache/incubator-iotdb/pull/258 <https://github.com/apache/incubator-iotdb/pull/258>. If you feel interested, feel free to give any advices or comments.

Thanks

Tian Jiang

Re: About the design and development of merge.

Posted by 江天 <jt...@163.com>.

Merge log for fast recovery after system failure:
1. At the beginning of a merge, a file named “merge.log” is created.
2. All files’ names involved in this merge are written into “merge.log”.
3. A line “merge start” is written into “merge.log” indicating that all files in this merge are listed above.
4. When a time series “ts_i” begins to merge, a line “ts_i start” is written into “merge.log” indicating that “ts_i” begin to merge.
5. When a time series “ts_i” ends merging, for each sequence file “seqFile_j”  that is merged during the processing of the timeseries, a line “seqFile_j end {current length of seqFile_j’s temp file}” is recorded，and a line “ts_i end” is written into “merge.log” indicating that “ts_i” is merged and the above file size changes are valid now.
6. When all time series “all ts end” is written indicating no more time series should be merged and all temp files are valid, and this time series can be skipped in recovery.
7. When a sequential file “seqFile_j" starts to merge with its temp file, a line “seqFile_j start {current length of seqFile_j}” is written .
8. When a sequential file “seqFile_j" ends to merge with its temp file, a line “seqFile_j end” is written. So this sequential file can be skipped in recovery.
9. When all sequential files have merged with its temp files, a final line “merge end” is written.
10. When all unsequential files are removed, “merge.log” is also removed.

Re: About the design and development of merge.

Posted by 江天 <jt...@163.com>.

A1: Yes. However, in IoTDB queries, the order of chunks is not that important as a simple sort by startTime can make them ordered again once they are loaded into memory.

A2: Yes, but not necessarily. We handle time series one by one to minimize the memory burden, but if memory is abundant, we can handle several time series at the same time. Nevertheless, handing multiple time series at the same time does not seem to bring any significant advantages. Eventually all data in the unsequential file is merged, so no marks need leaving. In case of system failure, I also design a merge log to fast recover and I will explain it in the next mail.

> 在 2019年7月8日，下午4:10，Xiangdong Huang <sa...@gmail.com> 写道：
> 
> Hi,
> 
> Q1: you change the order of Chunks in a TsFile. Does that break the time-ordering characteristic?
> 
> Q2: Do you mean handling time series one by one (An unseq file may consists of many devices and measurements)?  Do we need to make some marks on an unseq file if only the data of a part of devices are merged?
> 
> Best.
> -----------------------------------
> Xiangdong Huang
> School of Software, Tsinghua University
> 
>  黄向东
> 清华大学 软件学院
> 
> 
> 江天 <jt2594838@163.com <ma...@163.com>> 于2019年7月8日周一 下午3:49写道：
> I have worked out a simple solution of merge, the following picture explains the basic procedure of a merge, if you cannot see the picture, please visit http://assets.processon.com/chart_image/5d22a081e4b0878e40a8a4ae.png <http://assets.processon.com/chart_image/5d22a081e4b0878e40a8a4ae.png>. The procedure consists of 6 steps:
> 1. Build a MergeReader over the unsequential files.
> 2. Read data from the MergeReader by time-ascending order.
> 3. For each unsequential data point, find its corresponding chunk in sequential file.
> 4. Merge unsequantial data points with its corresponding sequential chunk to generate a new chunk in a new file.
> 5.1 If unmerged chunks are the minority, append unmerged chunks to the new file and generate metadata for it.
> 5.2 If merged chunks are the minority, remove metadata in the old file, append merged chunks to the old file and re-generate metadata for it, ignoring the chunks that have been merged in the old file.
> 6. Use the new file to replace the old file and remove useless files.
> 
> 
>

Re: About the design and development of merge.

Posted by Xiangdong Huang <sa...@gmail.com>.

Hi,

Q1: you change the order of Chunks in a TsFile. Does that break the
time-ordering characteristic?

Q2: Do you mean handling time series one by one (An unseq file may consists
of many devices and measurements)?  Do we need to make some marks on an
unseq file if only the data of a part of devices are merged?

Best.
-----------------------------------
Xiangdong Huang
School of Software, Tsinghua University

 黄向东
清华大学 软件学院


江天 <jt...@163.com> 于2019年7月8日周一 下午3:49写道：

> I have worked out a simple solution of merge, the following picture
> explains the basic procedure of a merge, if you cannot see the picture,
> please visit
> http://assets.processon.com/chart_image/5d22a081e4b0878e40a8a4ae.png. The
> procedure consists of 6 steps:
> 1. Build a MergeReader over the unsequential files.
> 2. Read data from the MergeReader by time-ascending order.
> 3. For each unsequential data point, find its corresponding chunk in
> sequential file.
> 4. Merge unsequantial data points with its corresponding sequential chunk
> to generate a new chunk in a new file.
> 5.1 If unmerged chunks are the minority, append unmerged chunks to the new
> file and generate metadata for it.
> 5.2 If merged chunks are the minority, remove metadata in the old file,
> append merged chunks to the old file and re-generate metadata for it,
> ignoring the chunks that have been merged in the old file.
> 6. Use the new file to replace the old file and remove useless files.
>
>
>
>

Re: About the design and development of merge.

Posted by 江天 <jt...@163.com>.

I have worked out a simple solution of merge, the following picture explains the basic procedure of a merge, if you cannot see the picture, please visit http://assets.processon.com/chart_image/5d22a081e4b0878e40a8a4ae.png <http://assets.processon.com/chart_image/5d22a081e4b0878e40a8a4ae.png>. The procedure consists of 6 steps:
1. Build a MergeReader over the unsequential files.
2. Read data from the MergeReader by time-ascending order.
3. For each unsequential data point, find its corresponding chunk in sequential file.
4. Merge unsequantial data points with its corresponding sequential chunk to generate a new chunk in a new file.
5.1 If unmerged chunks are the minority, append unmerged chunks to the new file and generate metadata for it.
5.2 If merged chunks are the minority, remove metadata in the old file, append merged chunks to the old file and re-generate metadata for it, ignoring the chunks that have been merged in the old file.
6. Use the new file to replace the old file and remove useless files.