You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@iotdb.apache.org by zyx <th...@qq.com> on 2020/07/12 03:22:57 UTC

[Weekly Report] IoTDB Weekly News (2020-07-04~2020-07-12)

Hi,


I'd like to share the weekly report for you this week.


[Big Event]
# We held the first online discussion yesterday, looking forward to more attendances next time.


[Finished work]
#1453 Refactor RPC and Sync modules to avoid redundant codes
#1400 Deleting a time range is supported


[in-progress]
#1411 Virtual Memtable for generating large Chunk in Tsfile
#1418 fix problem when using quotaion in the time series path
#1448 add lZ4 compression method




[bug fix]
#1460 Fix can not restart
#1470 fix OOM when show timeseries


Have a great weekend.


Thanks,

Yuxin Zhang





&nbsp;

Re: Re: [Weekly Report] IoTDB Weekly News (2020-07-04~2020-07-12)

Posted by "runhuster@foxmail.com" <ru...@foxmail.com>.
Good job! 

It looks like that reading chunkMeta cost is acceptable.  We do not need to keep chunkMeta in memory, hot compaction could be merged into merge operation.



Thanks!

runhuster@foxmail.com 

 
From: Haonan Hou
Date: 2020-07-13 22:27
To: dev@iotdb.apache.org
Subject: Re: [Weekly Report] IoTDB Weekly News (2020-07-04~2020-07-12)
Hi,
 
I did the experiment of the percentage of reading metadata in whole reading processing.
 
I created one TsFile with 1 storage group, 1 device and 100,000 measurements. Each measurement has 8 chunks of data and each chunk has 200 data points. The total size of the TsFile is 909.5MB.
 
Then I created and run a test on HDD to read all chunkMetadata and read all data points one by one using the ChunkMetadata and ChunkReader.  By doing this, I got the time of reading all ChunkMetadata and all data points.
 
The final result is as below:
 
Reading 800,000 ChunkMetadata costs 2977ms.
Reading 160,000,000 points costs 60780ms.
 
Best wishes,
 
Haonan Hou
 
On Jul 12, 2020, at 8:48 PM, Jialin Qiao <qj...@mails.tsinghua.edu.cn>> wrote:
 
Hi,
 
Thanks Justin, I would like to add some details.
 
We first introduced the main idea of hot compaction: When a memtable reaches the threshold but the average number of points in each series does not reach our goal, we flush it to a vm(virtual memory) file. After there is enough (a configuration) vm files, we merged all vm files to the target TsFile and close the TsFile. By this means, we could get a larger chunk that accommodates to the query.
 
The reason we call it hot compaction but not normal compaction is that the vm is not closed, which means it only has data chunks without relating metadata. All metadatas of vm are cached in memory. Therefore, we avoid IO, serializing and deserializing these metadatas when doing hot compaction. It is essentially an exchanging memory for IO and CPU.
 
However, we do not have clear idea about how much percent the reading metadata occupies in compaction. So we decided to do an experiment first.
 
Thanks,
--
Jialin Qiao
School of Software, Tsinghua University
 
乔嘉林
清华大学 软件学院
 
-----原始邮件-----
发件人: "Justin Mclean" <ju...@classsoftware.com>>
发送时间: 2020-07-12 17:31:30 (星期日)
收件人: dev <de...@iotdb.apache.org>>
抄送:
主题: Re: [Weekly Report] IoTDB Weekly News (2020-07-04~2020-07-12)
 
Hi,
 
[Big Event]
# We held the first online discussion yesterday, looking forward to more attendances next time.
 
It would be good if the detail and what was discussed was shared with this list.
 
Having meetings like disadvantages those who can not attend sure to time zone or other commitments.
 
Thanks,
Justin
 

Re: [Weekly Report] IoTDB Weekly News (2020-07-04~2020-07-12)

Posted by Haonan Hou <hh...@outlook.com>.
Hi,

I did the experiment of the percentage of reading metadata in whole reading processing.

I created one TsFile with 1 storage group, 1 device and 100,000 measurements. Each measurement has 8 chunks of data and each chunk has 200 data points. The total size of the TsFile is 909.5MB.

Then I created and run a test on HDD to read all chunkMetadata and read all data points one by one using the ChunkMetadata and ChunkReader.  By doing this, I got the time of reading all ChunkMetadata and all data points.

The final result is as below:

Reading 800,000 ChunkMetadata costs 2977ms.
Reading 160,000,000 points costs 60780ms.

Best wishes,

Haonan Hou

On Jul 12, 2020, at 8:48 PM, Jialin Qiao <qj...@mails.tsinghua.edu.cn>> wrote:

Hi,

Thanks Justin, I would like to add some details.

We first introduced the main idea of hot compaction: When a memtable reaches the threshold but the average number of points in each series does not reach our goal, we flush it to a vm(virtual memory) file. After there is enough (a configuration) vm files, we merged all vm files to the target TsFile and close the TsFile. By this means, we could get a larger chunk that accommodates to the query.

The reason we call it hot compaction but not normal compaction is that the vm is not closed, which means it only has data chunks without relating metadata. All metadatas of vm are cached in memory. Therefore, we avoid IO, serializing and deserializing these metadatas when doing hot compaction. It is essentially an exchanging memory for IO and CPU.

However, we do not have clear idea about how much percent the reading metadata occupies in compaction. So we decided to do an experiment first.

Thanks,
--
Jialin Qiao
School of Software, Tsinghua University

乔嘉林
清华大学 软件学院

-----原始邮件-----
发件人: "Justin Mclean" <ju...@classsoftware.com>>
发送时间: 2020-07-12 17:31:30 (星期日)
收件人: dev <de...@iotdb.apache.org>>
抄送:
主题: Re: [Weekly Report] IoTDB Weekly News (2020-07-04~2020-07-12)

Hi,

[Big Event]
# We held the first online discussion yesterday, looking forward to more attendances next time.

It would be good if the detail and what was discussed was shared with this list.

Having meetings like disadvantages those who can not attend sure to time zone or other commitments.

Thanks,
Justin


Re: [Weekly Report] IoTDB Weekly News (2020-07-04~2020-07-12)

Posted by Xiangdong Huang <sa...@gmail.com>.
Hi,

> It would be good if the detail and what was discussed was shared with
this list.
> Having meetings like disadvantages those who can not attend sure to time
zone or other commitments.


> Thanks Justin, I would like to add some details.

Before the discussion, I created a google doc for collecting attendee list
and topics. I think after the discussion, we can record the discussion and
conclusion, and archive it to our confluence.

Best,
-----------------------------------
Xiangdong Huang
School of Software, Tsinghua University

 黄向东
清华大学 软件学院


Dawei Liu <at...@163.com> 于2020年7月13日周一 上午9:57写道:

> Hi,
>
>
> thanks yuxin and jialin
>
>
> Best
> ———
> Dawei Liu
> On 07/12/2020 20:48,Jialin Qiao<qj...@mails.tsinghua.edu.cn> wrote:
> Hi,
>
> Thanks Justin, I would like to add some details.
>
> We first introduced the main idea of hot compaction: When a memtable
> reaches the threshold but the average number of points in each series does
> not reach our goal, we flush it to a vm(virtual memory) file. After there
> is enough (a configuration) vm files, we merged all vm files to the target
> TsFile and close the TsFile. By this means, we could get a larger chunk
> that accommodates to the query.
>
> The reason we call it hot compaction but not normal compaction is that the
> vm is not closed, which means it only has data chunks without relating
> metadata. All metadatas of vm are cached in memory. Therefore, we avoid IO,
> serializing and deserializing these metadatas when doing hot compaction. It
> is essentially an exchanging memory for IO and CPU.
>
> However, we do not have clear idea about how much percent the reading
> metadata occupies in compaction. So we decided to do an experiment first.
>
> Thanks,
> --
> Jialin Qiao
> School of Software, Tsinghua University
>
> 乔嘉林
> 清华大学 软件学院
>
> -----原始邮件-----
> 发件人: "Justin Mclean" <ju...@classsoftware.com>
> 发送时间: 2020-07-12 17:31:30 (星期日)
> 收件人: dev <de...@iotdb.apache.org>
> 抄送:
> 主题: Re: [Weekly Report] IoTDB Weekly News (2020-07-04~2020-07-12)
>
> Hi,
>
> [Big Event]
> # We held the first online discussion yesterday, looking forward to more
> attendances next time.
>
> It would be good if the detail and what was discussed was shared with this
> list.
>
> Having meetings like disadvantages those who can not attend sure to time
> zone or other commitments.
>
> Thanks,
> Justin
>

Re: [Weekly Report] IoTDB Weekly News (2020-07-04~2020-07-12)

Posted by Dawei Liu <at...@163.com>.
Hi,


thanks yuxin and jialin


Best
———
Dawei Liu
On 07/12/2020 20:48,Jialin Qiao<qj...@mails.tsinghua.edu.cn> wrote:
Hi,

Thanks Justin, I would like to add some details.

We first introduced the main idea of hot compaction: When a memtable reaches the threshold but the average number of points in each series does not reach our goal, we flush it to a vm(virtual memory) file. After there is enough (a configuration) vm files, we merged all vm files to the target TsFile and close the TsFile. By this means, we could get a larger chunk that accommodates to the query.

The reason we call it hot compaction but not normal compaction is that the vm is not closed, which means it only has data chunks without relating metadata. All metadatas of vm are cached in memory. Therefore, we avoid IO, serializing and deserializing these metadatas when doing hot compaction. It is essentially an exchanging memory for IO and CPU.

However, we do not have clear idea about how much percent the reading metadata occupies in compaction. So we decided to do an experiment first.

Thanks,
--
Jialin Qiao
School of Software, Tsinghua University

乔嘉林
清华大学 软件学院

-----原始邮件-----
发件人: "Justin Mclean" <ju...@classsoftware.com>
发送时间: 2020-07-12 17:31:30 (星期日)
收件人: dev <de...@iotdb.apache.org>
抄送:
主题: Re: [Weekly Report] IoTDB Weekly News (2020-07-04~2020-07-12)

Hi,

[Big Event]
# We held the first online discussion yesterday, looking forward to more attendances next time.

It would be good if the detail and what was discussed was shared with this list.

Having meetings like disadvantages those who can not attend sure to time zone or other commitments.

Thanks,
Justin

Re: [Weekly Report] IoTDB Weekly News (2020-07-04~2020-07-12)

Posted by Jialin Qiao <qj...@mails.tsinghua.edu.cn>.
Hi,

Thanks Justin, I would like to add some details.

We first introduced the main idea of hot compaction: When a memtable reaches the threshold but the average number of points in each series does not reach our goal, we flush it to a vm(virtual memory) file. After there is enough (a configuration) vm files, we merged all vm files to the target TsFile and close the TsFile. By this means, we could get a larger chunk that accommodates to the query.

The reason we call it hot compaction but not normal compaction is that the vm is not closed, which means it only has data chunks without relating metadata. All metadatas of vm are cached in memory. Therefore, we avoid IO, serializing and deserializing these metadatas when doing hot compaction. It is essentially an exchanging memory for IO and CPU.

However, we do not have clear idea about how much percent the reading metadata occupies in compaction. So we decided to do an experiment first.

Thanks,
--
Jialin Qiao
School of Software, Tsinghua University

乔嘉林
清华大学 软件学院

> -----原始邮件-----
> 发件人: "Justin Mclean" <ju...@classsoftware.com>
> 发送时间: 2020-07-12 17:31:30 (星期日)
> 收件人: dev <de...@iotdb.apache.org>
> 抄送: 
> 主题: Re: [Weekly Report] IoTDB Weekly News (2020-07-04~2020-07-12)
> 
> Hi,
> 
> > [Big Event]
> > # We held the first online discussion yesterday, looking forward to more attendances next time.
> 
> It would be good if the detail and what was discussed was shared with this list.
> 
> Having meetings like disadvantages those who can not attend sure to time zone or other commitments.
> 
> Thanks,
> Justin

Re: [Weekly Report] IoTDB Weekly News (2020-07-04~2020-07-12)

Posted by Justin Mclean <ju...@classsoftware.com>.
Hi,

> [Big Event]
> # We held the first online discussion yesterday, looking forward to more attendances next time.

It would be good if the detail and what was discussed was shared with this list.

Having meetings like disadvantages those who can not attend sure to time zone or other commitments.

Thanks,
Justin