You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@iotdb.apache.org by Jialin Qiao <qj...@mails.tsinghua.edu.cn> on 2020/03/30 14:02:56 UTC

New TsFile Structure

Hi,


The new TsFile structure (version 2) is ready [1]. 


The write speed is not affected, the query is accelerated, especially aggregation queries.


【Performance evaluation】



Hardware: macOS 10.14.5 2.2 GHz Intel Core i7, 4G memory.

Data set: 1 Storage group,1 device,3000 measurements,each timeseries has 600000 data points, long data type

IoTDB configuration:

enable_parameter_adapter=false
tsfile_size_threshold=1024L
memtable_size_threshold=5010241024L

[Write evaluation]

new_TsFile:300569ms,14.76G,184 tsfiles
master:300418ms,14.73G,184 tsfiles

[Query evaluation]

select s1 from root.sg1.d1

new_TsFile: 1349ms
master: 2102ms

select s1, s2, s3, s4, s5, s6, s7, s8, s9, s10 from root.sg1.d1

new_TsFile: 3268ms
master: 4621ms

select * from root

new_TsFile: 647934ms
master: 814206ms

select count(s1) from root.sg1.d1

new_TsFile: 421ms
master: 1654ms

select count(s1), count(s2), count(s3), count(s4), count(s5), count(s6), count(s7), count(s8), count(s9), count(s10) from root.sg1.d1

new_TsFile: 1887ms
master: 4231ms

select count(s1), count(s2), count(s3), count(s4), count(s5), count(s6), count(s7), count(s8), count(s9), count(s10), count(s11), count(s12), count(s13), count(s14), count(s15), count(s16), count(s17), count(s18), count(s19), count(s20), count(s21), count(s22), count(s23), count(s24), count(s25), count(s26), count(s27), count(s28), count(s29), count(s30) from root.sg1.d1

new_TsFile: 3066ms
master: 6653ms

select count(*) from root

new_TsFile: 2243ms
master: 614638ms





【Design of new TsFile】


In the previous version, the ChunkMetadata is stored by device. Therefore, if we want to query one series, we need to read ChunkMetadatas of all measurements of its device, which is time consuming.


In the new version, the ChunkMetadata is grouped by time series. Then, if we want to query one series, we only need to read ChunkMetadata
 of this series. A file level statistics TimeseriesMetadata is added for each series to accelerate aggregations.


Besides, by modifying the schema management of TsFile, the constraints that measurements that have the same name in one storage group should have the same data type is broken.




[1] https://github.com/apache/incubator-iotdb/pull/855


Thanks,
--
Jialin Qiao
School of Software, Tsinghua University

乔嘉林
清华大学 软件学院

Re: New TsFile Structure

Posted by Jialin Qiao <qj...@mails.tsinghua.edu.cn>.
Hi,

Thanks for all the reviewers, the new TsFile is merged. 

To release 0.10.0, the next step is providing an online-upgrade function, which means you can launch IoTDB 0.10 on the data folder of 0.9.
The old TsFiles can be queried immediately and upgraded to new TsFiles in the background.

Thanks,
--
Jialin Qiao
School of Software, Tsinghua University

乔嘉林
清华大学 软件学院

> -----原始邮件-----
> 发件人: atoildw <at...@163.com>
> 发送时间: 2020-03-31 10:35:37 (星期二)
> 收件人: dev <de...@iotdb.apache.org>
> 抄送: "dev@iotdb.apache.org" <de...@iotdb.apache.org>
> 主题: Re: New TsFile Structure
> 
> Hi,
> 
> 
> Good job! 
> 
> 
> Looking forward to the next release, then I would like to deploy IoTDB in my company
> 
> 
> 
> I saw the PR and there is a lot of code format, please try to avoid this next time. It will make the PR review difficult.
> 
> 
> 
> —————
> 
> 
> DaWei Liu
> On 03/31/2020 10:04, Haonan Hou wrote:
> Hi,
> 
> I did the performance evaluation too, and got a similar conclusion.
> 
> Hardware: macOS 10.15.4 2.9 GHz Intel Core i5, 8G memory.
> Data set: 1 Storage group,1 device,3000 measurements,each timeseries has 600000 data points, long data type
> 1. select s1 from root.sg1.d1
> new_TsFile: 2572
> master: 2666
> 2. select s1, s2, s3, s4, s5, s6, s7, s8, s9, s10 from root.sg1.d1
> new_TsFile: 5455
> master: 6146
> 3. select count(s1) from root.sg1.d1
> new_TsFile: 570
> master: 1510
> 4. select count(s1), count(s2), count(s3), count(s4), count(s5), count(s6), count(s7), count(s8), count(s9), count(s10) from root.sg1.d1
> new_TsFile: 2132
> master: 3675
> 5. "select count(s1), count(s2), count(s3), count(s4), count(s5), count(s6), count(s7), count(s8), count(s9), count(s10), count(s11), count(s12), count(s13), count(s14), count(s15), count(s16), count(s17), count(s18), count(s19), count(s20) from root.sg1.d1"
> new_TsFile: 2874
> master: 5357
> Thanks,
> Haonan Hou
> 
> 
> On Mar 30, 2020, at 10:02 PM, Jialin Qiao <qj...@mails.tsinghua.edu.cn>> wrote:
> 
> Hi,
> 
> 
> The new TsFile structure (version 2) is ready [1].
> 
> 
> The write speed is not affected, the query is accelerated, especially aggregation queries.
> 
> 
> 【Performance evaluation】
> 
> 
> 
> Hardware: macOS 10.14.5 2.2 GHz Intel Core i7, 4G memory.
> 
> Data set: 1 Storage group,1 device,3000 measurements,each timeseries has 600000 data points, long data type
> 
> IoTDB configuration:
> 
> enable_parameter_adapter=false
> tsfile_size_threshold=1024L
> memtable_size_threshold=5010241024L
> 
> [Write evaluation]
> 
> new_TsFile:300569ms,14.76G,184 tsfiles
> master:300418ms,14.73G,184 tsfiles
> 
> [Query evaluation]
> 
> select s1 from root.sg1.d1
> 
> new_TsFile: 1349ms
> master: 2102ms
> 
> select s1, s2, s3, s4, s5, s6, s7, s8, s9, s10 from root.sg1.d1
> 
> new_TsFile: 3268ms
> master: 4621ms
> 
> select * from root
> 
> new_TsFile: 647934ms
> master: 814206ms
> 
> select count(s1) from root.sg1.d1
> 
> new_TsFile: 421ms
> master: 1654ms
> 
> select count(s1), count(s2), count(s3), count(s4), count(s5), count(s6), count(s7), count(s8), count(s9), count(s10) from root.sg1.d1
> 
> new_TsFile: 1887ms
> master: 4231ms
> 
> select count(s1), count(s2), count(s3), count(s4), count(s5), count(s6), count(s7), count(s8), count(s9), count(s10), count(s11), count(s12), count(s13), count(s14), count(s15), count(s16), count(s17), count(s18), count(s19), count(s20), count(s21), count(s22), count(s23), count(s24), count(s25), count(s26), count(s27), count(s28), count(s29), count(s30) from root.sg1.d1
> 
> new_TsFile: 3066ms
> master: 6653ms
> 
> select count(*) from root
> 
> new_TsFile: 2243ms
> master: 614638ms
> 
> 
> 
> 
> 
> 【Design of new TsFile】
> 
> 
> In the previous version, the ChunkMetadata is stored by device. Therefore, if we want to query one series, we need to read ChunkMetadatas of all measurements of its device, which is time consuming.
> 
> 
> In the new version, the ChunkMetadata is grouped by time series. Then, if we want to query one series, we only need to read ChunkMetadata
> of this series. A file level statistics TimeseriesMetadata is added for each series to accelerate aggregations.
> 
> 
> Besides, by modifying the schema management of TsFile, the constraints that measurements that have the same name in one storage group should have the same data type is broken.
> 
> 
> 
> 
> [1] https://github.com/apache/incubator-iotdb/pull/855
> 
> 
> Thanks,
> --
> Jialin Qiao
> School of Software, Tsinghua University
> 
> 乔嘉林
> 清华大学 软件学院
> 

Re: New TsFile Structure

Posted by atoildw <at...@163.com>.
Hi,


Good job! 


Looking forward to the next release, then I would like to deploy IoTDB in my company



I saw the PR and there is a lot of code format, please try to avoid this next time. It will make the PR review difficult.



—————


DaWei Liu
On 03/31/2020 10:04, Haonan Hou wrote:
Hi,

I did the performance evaluation too, and got a similar conclusion.

Hardware: macOS 10.15.4 2.9 GHz Intel Core i5, 8G memory.
Data set: 1 Storage group,1 device,3000 measurements,each timeseries has 600000 data points, long data type
1. select s1 from root.sg1.d1
new_TsFile: 2572
master: 2666
2. select s1, s2, s3, s4, s5, s6, s7, s8, s9, s10 from root.sg1.d1
new_TsFile: 5455
master: 6146
3. select count(s1) from root.sg1.d1
new_TsFile: 570
master: 1510
4. select count(s1), count(s2), count(s3), count(s4), count(s5), count(s6), count(s7), count(s8), count(s9), count(s10) from root.sg1.d1
new_TsFile: 2132
master: 3675
5. "select count(s1), count(s2), count(s3), count(s4), count(s5), count(s6), count(s7), count(s8), count(s9), count(s10), count(s11), count(s12), count(s13), count(s14), count(s15), count(s16), count(s17), count(s18), count(s19), count(s20) from root.sg1.d1"
new_TsFile: 2874
master: 5357
Thanks,
Haonan Hou


On Mar 30, 2020, at 10:02 PM, Jialin Qiao <qj...@mails.tsinghua.edu.cn>> wrote:

Hi,


The new TsFile structure (version 2) is ready [1].


The write speed is not affected, the query is accelerated, especially aggregation queries.


【Performance evaluation】



Hardware: macOS 10.14.5 2.2 GHz Intel Core i7, 4G memory.

Data set: 1 Storage group,1 device,3000 measurements,each timeseries has 600000 data points, long data type

IoTDB configuration:

enable_parameter_adapter=false
tsfile_size_threshold=1024L
memtable_size_threshold=5010241024L

[Write evaluation]

new_TsFile:300569ms,14.76G,184 tsfiles
master:300418ms,14.73G,184 tsfiles

[Query evaluation]

select s1 from root.sg1.d1

new_TsFile: 1349ms
master: 2102ms

select s1, s2, s3, s4, s5, s6, s7, s8, s9, s10 from root.sg1.d1

new_TsFile: 3268ms
master: 4621ms

select * from root

new_TsFile: 647934ms
master: 814206ms

select count(s1) from root.sg1.d1

new_TsFile: 421ms
master: 1654ms

select count(s1), count(s2), count(s3), count(s4), count(s5), count(s6), count(s7), count(s8), count(s9), count(s10) from root.sg1.d1

new_TsFile: 1887ms
master: 4231ms

select count(s1), count(s2), count(s3), count(s4), count(s5), count(s6), count(s7), count(s8), count(s9), count(s10), count(s11), count(s12), count(s13), count(s14), count(s15), count(s16), count(s17), count(s18), count(s19), count(s20), count(s21), count(s22), count(s23), count(s24), count(s25), count(s26), count(s27), count(s28), count(s29), count(s30) from root.sg1.d1

new_TsFile: 3066ms
master: 6653ms

select count(*) from root

new_TsFile: 2243ms
master: 614638ms





【Design of new TsFile】


In the previous version, the ChunkMetadata is stored by device. Therefore, if we want to query one series, we need to read ChunkMetadatas of all measurements of its device, which is time consuming.


In the new version, the ChunkMetadata is grouped by time series. Then, if we want to query one series, we only need to read ChunkMetadata
of this series. A file level statistics TimeseriesMetadata is added for each series to accelerate aggregations.


Besides, by modifying the schema management of TsFile, the constraints that measurements that have the same name in one storage group should have the same data type is broken.




[1] https://github.com/apache/incubator-iotdb/pull/855


Thanks,
--
Jialin Qiao
School of Software, Tsinghua University

乔嘉林
清华大学 软件学院


Re: New TsFile Structure

Posted by Haonan Hou <hh...@outlook.com>.
Hi,

I did the performance evaluation too, and got a similar conclusion.

Hardware: macOS 10.15.4 2.9 GHz Intel Core i5, 8G memory.
Data set: 1 Storage group,1 device,3000 measurements,each timeseries has 600000 data points, long data type
1. select s1 from root.sg1.d1
new_TsFile: 2572
master: 2666
2. select s1, s2, s3, s4, s5, s6, s7, s8, s9, s10 from root.sg1.d1
new_TsFile: 5455
master: 6146
3. select count(s1) from root.sg1.d1
new_TsFile: 570
master: 1510
4. select count(s1), count(s2), count(s3), count(s4), count(s5), count(s6), count(s7), count(s8), count(s9), count(s10) from root.sg1.d1
new_TsFile: 2132
master: 3675
5. "select count(s1), count(s2), count(s3), count(s4), count(s5), count(s6), count(s7), count(s8), count(s9), count(s10), count(s11), count(s12), count(s13), count(s14), count(s15), count(s16), count(s17), count(s18), count(s19), count(s20) from root.sg1.d1"
new_TsFile: 2874
master: 5357
Thanks,
Haonan Hou


On Mar 30, 2020, at 10:02 PM, Jialin Qiao <qj...@mails.tsinghua.edu.cn>> wrote:

Hi,


The new TsFile structure (version 2) is ready [1].


The write speed is not affected, the query is accelerated, especially aggregation queries.


【Performance evaluation】



Hardware: macOS 10.14.5 2.2 GHz Intel Core i7, 4G memory.

Data set: 1 Storage group,1 device,3000 measurements,each timeseries has 600000 data points, long data type

IoTDB configuration:

enable_parameter_adapter=false
tsfile_size_threshold=1024L
memtable_size_threshold=5010241024L

[Write evaluation]

new_TsFile:300569ms,14.76G,184 tsfiles
master:300418ms,14.73G,184 tsfiles

[Query evaluation]

select s1 from root.sg1.d1

new_TsFile: 1349ms
master: 2102ms

select s1, s2, s3, s4, s5, s6, s7, s8, s9, s10 from root.sg1.d1

new_TsFile: 3268ms
master: 4621ms

select * from root

new_TsFile: 647934ms
master: 814206ms

select count(s1) from root.sg1.d1

new_TsFile: 421ms
master: 1654ms

select count(s1), count(s2), count(s3), count(s4), count(s5), count(s6), count(s7), count(s8), count(s9), count(s10) from root.sg1.d1

new_TsFile: 1887ms
master: 4231ms

select count(s1), count(s2), count(s3), count(s4), count(s5), count(s6), count(s7), count(s8), count(s9), count(s10), count(s11), count(s12), count(s13), count(s14), count(s15), count(s16), count(s17), count(s18), count(s19), count(s20), count(s21), count(s22), count(s23), count(s24), count(s25), count(s26), count(s27), count(s28), count(s29), count(s30) from root.sg1.d1

new_TsFile: 3066ms
master: 6653ms

select count(*) from root

new_TsFile: 2243ms
master: 614638ms





【Design of new TsFile】


In the previous version, the ChunkMetadata is stored by device. Therefore, if we want to query one series, we need to read ChunkMetadatas of all measurements of its device, which is time consuming.


In the new version, the ChunkMetadata is grouped by time series. Then, if we want to query one series, we only need to read ChunkMetadata
of this series. A file level statistics TimeseriesMetadata is added for each series to accelerate aggregations.


Besides, by modifying the schema management of TsFile, the constraints that measurements that have the same name in one storage group should have the same data type is broken.




[1] https://github.com/apache/incubator-iotdb/pull/855


Thanks,
--
Jialin Qiao
School of Software, Tsinghua University

乔嘉林
清华大学 软件学院


Re: New TsFile Structure

Posted by 田原 <ti...@mails.tsinghua.edu.cn>.
Hi, Qiao

 Wow, it's a very nice job. I also did the similar evaluation and got the same result as you.
Hope the new version can be merged as soon as possible.


> -----原始邮件-----
> 发件人: "Jialin Qiao" <qj...@mails.tsinghua.edu.cn>
> 发送时间: 2020-03-30 22:02:56 (星期一)
> 收件人: dev-iotdb <de...@iotdb.apache.org>
> 抄送: 
> 主题: New TsFile Structure
> 
> Hi,
> 
> 
> The new TsFile structure (version 2) is ready [1]. 
> 
> 
> The write speed is not affected, the query is accelerated, especially aggregation queries.
> 
> 
> 【Performance evaluation】
> 
> 
> 
> Hardware: macOS 10.14.5 2.2 GHz Intel Core i7, 4G memory.
> 
> Data set: 1 Storage group,1 device,3000 measurements,each timeseries has 600000 data points, long data type
> 
> IoTDB configuration:
> 
> enable_parameter_adapter=false
> tsfile_size_threshold=1024L
> memtable_size_threshold=5010241024L
> 
> [Write evaluation]
> 
> new_TsFile:300569ms,14.76G,184 tsfiles
> master:300418ms,14.73G,184 tsfiles
> 
> [Query evaluation]
> 
> select s1 from root.sg1.d1
> 
> new_TsFile: 1349ms
> master: 2102ms
> 
> select s1, s2, s3, s4, s5, s6, s7, s8, s9, s10 from root.sg1.d1
> 
> new_TsFile: 3268ms
> master: 4621ms
> 
> select * from root
> 
> new_TsFile: 647934ms
> master: 814206ms
> 
> select count(s1) from root.sg1.d1
> 
> new_TsFile: 421ms
> master: 1654ms
> 
> select count(s1), count(s2), count(s3), count(s4), count(s5), count(s6), count(s7), count(s8), count(s9), count(s10) from root.sg1.d1
> 
> new_TsFile: 1887ms
> master: 4231ms
> 
> select count(s1), count(s2), count(s3), count(s4), count(s5), count(s6), count(s7), count(s8), count(s9), count(s10), count(s11), count(s12), count(s13), count(s14), count(s15), count(s16), count(s17), count(s18), count(s19), count(s20), count(s21), count(s22), count(s23), count(s24), count(s25), count(s26), count(s27), count(s28), count(s29), count(s30) from root.sg1.d1
> 
> new_TsFile: 3066ms
> master: 6653ms
> 
> select count(*) from root
> 
> new_TsFile: 2243ms
> master: 614638ms
> 
> 
> 
> 
> 
> 【Design of new TsFile】
> 
> 
> In the previous version, the ChunkMetadata is stored by device. Therefore, if we want to query one series, we need to read ChunkMetadatas of all measurements of its device, which is time consuming.
> 
> 
> In the new version, the ChunkMetadata is grouped by time series. Then, if we want to query one series, we only need to read ChunkMetadata
>  of this series. A file level statistics TimeseriesMetadata is added for each series to accelerate aggregations.
> 
> 
> Besides, by modifying the schema management of TsFile, the constraints that measurements that have the same name in one storage group should have the same data type is broken.
> 
> 
> 
> 
> [1] https://github.com/apache/incubator-iotdb/pull/855
> 
> 
> Thanks,
> --
> Jialin Qiao
> School of Software, Tsinghua University
> 
> 乔嘉林
> 清华大学 软件学院