You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@iotdb.apache.org by Dawei Liu <at...@163.com> on 2020/02/24 08:40:05 UTC

[DISCUSS] Optimize TsFile structure to reduce unnecessary IO

Hi,

In the current TsFile structure, PageHeader and PageData are compactly put together in a Chunk, like a chain structure [1].

The basic unit that is read from the hard disk each time is the Chunk. 
For the query scenario of device * sensor, it would appear that we read too much data, 
so we considered a new optimization direction: 
use PageHeaders to filter the data first, then we can be more precise about which Pages need to be read..

But we still have a debate about where to put the PageHeaders:

1.put the PageHeader into the ChunkMetaData. 
The nice thing about this is that we can start filtering the data once the IO is done.

2.put the PageHeader in the ChunkHeader.
so we need to read the PageHeader one more time, but the advantage is that we save more memory when we read the List of the device.

For details, please see [2]

What do you think?



Regards,
---
Dawei Liu


[1] https://user-images.githubusercontent.com/33376433/69341240-26012300-0ca4-11ea-91a1-d516810cad44.png <https://user-images.githubusercontent.com/33376433/69341240-26012300-0ca4-11ea-91a1-d516810cad44.png>
[2] https://issues.apache.org/jira/secure/attachment/12994279/131582515824_.pic_hd.jpg <https://issues.apache.org/jira/secure/attachment/12994279/131582515824_.pic_hd.jpg>

Re: [DISCUSS] Optimize TsFile structure to reduce unnecessary IO

Posted by Jialin Qiao <qj...@mails.tsinghua.edu.cn>.

Hi,

Maybe we need to put PageHeaders into ChunkHeader, if we put PageHeaders into ChunkMetadata, we could not sequentially read TsFile.

Thanks,
--
Jialin Qiao
School of Software, Tsinghua University

乔嘉林
清华大学 软件学院

> -----原始邮件-----
> 发件人: "Dawei Liu" <at...@163.com>
> 发送时间: 2020-02-24 16:40:05 (星期一)
> 收件人: dev@iotdb.apache.org
> 抄送: 
> 主题: [DISCUSS] Optimize TsFile structure to reduce unnecessary IO
> 
> Hi,
> 
> In the current TsFile structure, PageHeader and PageData are compactly put together in a Chunk, like a chain structure [1].
> 
> The basic unit that is read from the hard disk each time is the Chunk. 
> For the query scenario of device * sensor, it would appear that we read too much data, 
> so we considered a new optimization direction: 
> use PageHeaders to filter the data first, then we can be more precise about which Pages need to be read..
> 
> But we still have a debate about where to put the PageHeaders:
> 
> 1.put the PageHeader into the ChunkMetaData. 
> The nice thing about this is that we can start filtering the data once the IO is done.
> 
> 2.put the PageHeader in the ChunkHeader.
> so we need to read the PageHeader one more time, but the advantage is that we save more memory when we read the List of the device.
> 
> For details, please see [2]
> 
> What do you think?
> 
> 
> 
> Regards,
> ---
> Dawei Liu
> 
> 
> [1] https://user-images.githubusercontent.com/33376433/69341240-26012300-0ca4-11ea-91a1-d516810cad44.png <https://user-images.githubusercontent.com/33376433/69341240-26012300-0ca4-11ea-91a1-d516810cad44.png>
> [2] https://issues.apache.org/jira/secure/attachment/12994279/131582515824_.pic_hd.jpg <https://issues.apache.org/jira/secure/attachment/12994279/131582515824_.pic_hd.jpg>