You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@iotdb.apache.org by Haiming Zhu <he...@gmail.com> on 2022/05/09 09:19:31 UTC

Change semantics of TsFile filename

Hi, everyone

Currently, the filename format of each tsfile is
{file_created_time}-{version_id}-{inner_space_merge_num}-{cross_space_merge_num}.tsfile.
In one time partition, the order of tsfiles is guaranteed by the
version_id, for example, 1651825804093-2-0-0.tsfile is after
1651825804092-1-0-0.tsfile

The problem is that filename conflict may occur in the cross space
compaction and load scenes. In the cross space compaction, assuming there
exists 3-2-0-0.tsfile, 4-3-0-0.tsfile and 5-5-0-0.tsfile in the sequence
folder, if file 4-3-0-0.tsfile is selected, compaction cannot generate 3 or
more target files because only 2 version_id are left between 2 and 5, so
some big target files may be generated. In the load, assuming there exists
3-2-0-0.tsfile, 3-3-0-0.tsfile and 3-3-0-0.tsfile in the sequence folder,
no more sequence files cannot be loaded between 3-2-0-0.tsfile and
3-3-0-0.tsfile, they can only be loaded into the unsequence folder.

In response to these problems, the format won't be changed, but the meaning
of file_created_time and version_id will be different. Instead of
version_id, we use file_created_time to guarantee the order of tsfiles, and
if two tsfiles have the same file_created_time, then we use version_id to
guarantee the order. This semantics change may afftect query, compaction
and load module.

Hope for some suggestions.

Best,
------------------------------------
Haiming Zhu
School of Software, Tsinghua University

朱海铭
清华大学 软件学院