You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@iotdb.apache.org by Gaofei Cao <cg...@foxmail.com> on 2019/01/04 16:12:04 UTC

merge kill_thanos branch to the master branch

kill_thanos branch in thulab/iotdb has refactored most features as below.


1. Replacing the single point calculation logic with a batch data load behavior.
In previous branch, the most important two methods in the `Reader` of IoTDB are `hasNext` and `next` methods, which examine that whether the given query series has next point and calculate next point. Multiple invoking of these two methods decreasing the performance of query, so we added two new methods `hasNextBatch` and `nextBatch`. As a result, we will load and transfer data in batch rather than a single point. These two methods are friendly to CPU.


2. Using nio.
In this branch, we replaced ByteArrayInputStream with NIO, taking the advantage of java NIO. We used `Channel`, `Buffer`, `MMap` more frequently.


3. Adding file stream manager.
In a query of IoTDB, multiple series may be queried, such as a sql `select * from root.vehicle`. To avoid opened one tsfile multiple times, we adopting a file stream manager, which ensure that one file will be opened at most once in IoTDB queries. We adopt an `ExpiredTimeMap` to manage opened file streams, and close some files when they are not used for a given expired time.  Maybe there are better file stream reader management methods, I will keep trace it.


4. Optimizing filter efficiency.
Firstly, we removed the previously `Visitor Pattern` implementation of filter, and adopted an intuitive implementation.
Secondly,  we optimized some filter logic to promote performance. For example, in a sql `select sensor_0, sensor_1 from device_0 where sensor_1 > 10`, we did some optimization to avoid  the duplicate data calculation of `sensor_1`.


5. Others, such as removing serialization of thrift, changing the file format of TsFile, maybe someone else can make a supplement.


I suggest that merging it into master branch in the next week.


Experimental results show that the query test in kill_thanos branch has approximately 30% ~ 60% performance promotion.


By the way, I am considering that how to get a standard, convincing test data (in IoT domain) to test the writing and querying performance of IoTDB.  Currently, we just use the data generated by `IoTDB Benchmark` (another project, also available on github.com/thulab/iotdb-benchmark), which generated 10w row records of 100device * 100sensor.


Thanks & Best Regards


-----------------------------------
Cao Gaofei (曹高飞)
School of Software,
Tsinghua University
-----------------------------------

Re: merge kill_thanos branch to the master branch

Posted by Xu yi <xu...@126.com>.
I’m not sure unit tests are fully passed especially on windows os. Need to re-check.

iPhoneから送信

2019/01/05 11:54、乔嘉林 Jialin Qiao <qj...@mails.tsinghua.edu.cn>のメール:

> Hi,  glad to kill Thanos :) 
> 
> I agree to merge kill_thanos into master. We have already made an unofficial release version(v0.7.1) in Github for master as backups.
> 
> As a reminder, except for these optimizations, some features are unavailable compared to the master branch:
> 
> 1. Update and Delete operations.
> 
> 2. Advanced queries: Aggregation, GroupByTime and Fill.
> 
> Besides, the ‘hasNextBatch' and ‘nextBatch’ methods are implemented in TsFile, but most remain to be done in IoTDB engine. 
> 
> The kill_thanos changes too much... We can add these features and further optimize the code with other PRs. 
> 
> Best.
> 
> --
> Jialin Qiao
> School of Software, Tsinghua University
> 
> 乔嘉林
> 清华大学 软件学院
> 
>> -----原始邮件-----
>> 发件人: "Xiangdong Huang" <sa...@gmail.com>
>> 发送时间: 2019-01-05 11:36:07 (星期六)
>> 收件人: dev@iotdb.apache.org
>> 抄送: 
>> 主题: Re: merge kill_thanos branch to the master branch
>> 
>> I think the biggest issue of the current master is that the package
>> structures are chaotic.
>> The issue prevents new developers to understand the project.
>> It is a villain like Thanos in the Marvel Universe.  That's why the new
>> branch is called kill_Thanos.
>> 
>> Except what Gaofei mentioned, the storage module, TsFile, is also
>> refactored, and the file format has some changes.
>> A brief introduction is at
>> https://github.com/thulab/iotdb/wiki/%5BTsFile%5D-What-is-new-from-v0.7.0--to-Kill_Thanos
>> 
>> 
>> In the kill_Thanos branch, the package structure is more clear, but there
>> are still many source codes can be refactored better.
>> However, it brings extra works to merge the modifications from master into
>> the kill_Thanos.
>> 
>> Because all UT and IT in current kill_Thanos are passed, and the
>> performance is better, I agree to merge the branches as soon as possible.
>> 
>> Best,
>> -----------------------------------
>> Xiangdong Huang
>> School of Software, Tsinghua University
>> 
>> 黄向东
>> 清华大学 软件学院
>> 
>> 
>> Gaofei Cao <cg...@foxmail.com> 于2019年1月5日周六 上午12:23写道:
>> 
>>> kill_thanos branch in thulab/iotdb has refactored most features as below.
>>> 
>>> 
>>> 1. Replacing the single point calculation logic with a batch data load
>>> behavior.
>>> In previous branch, the most important two methods in the `Reader` of
>>> IoTDB are `hasNext` and `next` methods, which examine that whether the
>>> given query series has next point and calculate next point. Multiple
>>> invoking of these two methods decreasing the performance of query, so we
>>> added two new methods `hasNextBatch` and `nextBatch`. As a result, we will
>>> load and transfer data in batch rather than a single point. These two
>>> methods are friendly to CPU.
>>> 
>>> 
>>> 2. Using nio.
>>> In this branch, we replaced ByteArrayInputStream with NIO, taking the
>>> advantage of java NIO. We used `Channel`, `Buffer`, `MMap` more frequently.
>>> 
>>> 
>>> 3. Adding file stream manager.
>>> In a query of IoTDB, multiple series may be queried, such as a sql `select
>>> * from root.vehicle`. To avoid opened one tsfile multiple times, we
>>> adopting a file stream manager, which ensure that one file will be opened
>>> at most once in IoTDB queries. We adopt an `ExpiredTimeMap` to manage
>>> opened file streams, and close some files when they are not used for a
>>> given expired time.  Maybe there are better file stream reader management
>>> methods, I will keep trace it.
>>> 
>>> 
>>> 4. Optimizing filter efficiency.
>>> Firstly, we removed the previously `Visitor Pattern` implementation of
>>> filter, and adopted an intuitive implementation.
>>> Secondly,  we optimized some filter logic to promote performance. For
>>> example, in a sql `select sensor_0, sensor_1 from device_0 where sensor_1 >
>>> 10`, we did some optimization to avoid  the duplicate data calculation of
>>> `sensor_1`.
>>> 
>>> 
>>> 5. Others, such as removing serialization of thrift, changing the file
>>> format of TsFile, maybe someone else can make a supplement.
>>> 
>>> 
>>> I suggest that merging it into master branch in the next week.
>>> 
>>> 
>>> Experimental results show that the query test in kill_thanos branch has
>>> approximately 30% ~ 60% performance promotion.
>>> 
>>> 
>>> By the way, I am considering that how to get a standard, convincing test
>>> data (in IoT domain) to test the writing and querying performance of
>>> IoTDB.  Currently, we just use the data generated by `IoTDB Benchmark`
>>> (another project, also available on github.com/thulab/iotdb-benchmark),
>>> which generated 10w row records of 100device * 100sensor.
>>> 
>>> 
>>> Thanks & Best Regards
>>> 
>>> 
>>> -----------------------------------
>>> Cao Gaofei (曹高飞)
>>> School of Software,
>>> Tsinghua University
>>> -----------------------------------


Re: Re: merge kill_thanos branch to the master branch

Posted by 乔嘉林 Jialin Qiao <qj...@mails.tsinghua.edu.cn>.
 Hi,  glad to kill Thanos :) 

 I agree to merge kill_thanos into master. We have already made an unofficial release version(v0.7.1) in Github for master as backups.

 As a reminder, except for these optimizations, some features are unavailable compared to the master branch:

 1. Update and Delete operations.

 2. Advanced queries: Aggregation, GroupByTime and Fill.

 Besides, the ‘hasNextBatch' and ‘nextBatch’ methods are implemented in TsFile, but most remain to be done in IoTDB engine. 

 The kill_thanos changes too much... We can add these features and further optimize the code with other PRs. 

 Best.

--
Jialin Qiao
School of Software, Tsinghua University

乔嘉林
清华大学 软件学院

> -----原始邮件-----
> 发件人: "Xiangdong Huang" <sa...@gmail.com>
> 发送时间: 2019-01-05 11:36:07 (星期六)
> 收件人: dev@iotdb.apache.org
> 抄送: 
> 主题: Re: merge kill_thanos branch to the master branch
> 
> I think the biggest issue of the current master is that the package
> structures are chaotic.
> The issue prevents new developers to understand the project.
> It is a villain like Thanos in the Marvel Universe.  That's why the new
> branch is called kill_Thanos.
> 
> Except what Gaofei mentioned, the storage module, TsFile, is also
> refactored, and the file format has some changes.
> A brief introduction is at
> https://github.com/thulab/iotdb/wiki/%5BTsFile%5D-What-is-new-from-v0.7.0--to-Kill_Thanos
> 
> 
> In the kill_Thanos branch, the package structure is more clear, but there
> are still many source codes can be refactored better.
> However, it brings extra works to merge the modifications from master into
> the kill_Thanos.
> 
> Because all UT and IT in current kill_Thanos are passed, and the
> performance is better, I agree to merge the branches as soon as possible.
> 
> Best,
> -----------------------------------
> Xiangdong Huang
> School of Software, Tsinghua University
> 
>  黄向东
> 清华大学 软件学院
> 
> 
> Gaofei Cao <cg...@foxmail.com> 于2019年1月5日周六 上午12:23写道:
> 
> > kill_thanos branch in thulab/iotdb has refactored most features as below.
> >
> >
> > 1. Replacing the single point calculation logic with a batch data load
> > behavior.
> > In previous branch, the most important two methods in the `Reader` of
> > IoTDB are `hasNext` and `next` methods, which examine that whether the
> > given query series has next point and calculate next point. Multiple
> > invoking of these two methods decreasing the performance of query, so we
> > added two new methods `hasNextBatch` and `nextBatch`. As a result, we will
> > load and transfer data in batch rather than a single point. These two
> > methods are friendly to CPU.
> >
> >
> > 2. Using nio.
> > In this branch, we replaced ByteArrayInputStream with NIO, taking the
> > advantage of java NIO. We used `Channel`, `Buffer`, `MMap` more frequently.
> >
> >
> > 3. Adding file stream manager.
> > In a query of IoTDB, multiple series may be queried, such as a sql `select
> > * from root.vehicle`. To avoid opened one tsfile multiple times, we
> > adopting a file stream manager, which ensure that one file will be opened
> > at most once in IoTDB queries. We adopt an `ExpiredTimeMap` to manage
> > opened file streams, and close some files when they are not used for a
> > given expired time.  Maybe there are better file stream reader management
> > methods, I will keep trace it.
> >
> >
> > 4. Optimizing filter efficiency.
> > Firstly, we removed the previously `Visitor Pattern` implementation of
> > filter, and adopted an intuitive implementation.
> > Secondly,  we optimized some filter logic to promote performance. For
> > example, in a sql `select sensor_0, sensor_1 from device_0 where sensor_1 >
> > 10`, we did some optimization to avoid  the duplicate data calculation of
> > `sensor_1`.
> >
> >
> > 5. Others, such as removing serialization of thrift, changing the file
> > format of TsFile, maybe someone else can make a supplement.
> >
> >
> > I suggest that merging it into master branch in the next week.
> >
> >
> > Experimental results show that the query test in kill_thanos branch has
> > approximately 30% ~ 60% performance promotion.
> >
> >
> > By the way, I am considering that how to get a standard, convincing test
> > data (in IoT domain) to test the writing and querying performance of
> > IoTDB.  Currently, we just use the data generated by `IoTDB Benchmark`
> > (another project, also available on github.com/thulab/iotdb-benchmark),
> > which generated 10w row records of 100device * 100sensor.
> >
> >
> > Thanks & Best Regards
> >
> >
> > -----------------------------------
> > Cao Gaofei (曹高飞)
> > School of Software,
> > Tsinghua University
> > -----------------------------------

Re: merge kill_thanos branch to the master branch

Posted by Xiangdong Huang <sa...@gmail.com>.
I think the biggest issue of the current master is that the package
structures are chaotic.
The issue prevents new developers to understand the project.
It is a villain like Thanos in the Marvel Universe.  That's why the new
branch is called kill_Thanos.

Except what Gaofei mentioned, the storage module, TsFile, is also
refactored, and the file format has some changes.
A brief introduction is at
https://github.com/thulab/iotdb/wiki/%5BTsFile%5D-What-is-new-from-v0.7.0--to-Kill_Thanos


In the kill_Thanos branch, the package structure is more clear, but there
are still many source codes can be refactored better.
However, it brings extra works to merge the modifications from master into
the kill_Thanos.

Because all UT and IT in current kill_Thanos are passed, and the
performance is better, I agree to merge the branches as soon as possible.

Best,
-----------------------------------
Xiangdong Huang
School of Software, Tsinghua University

 黄向东
清华大学 软件学院


Gaofei Cao <cg...@foxmail.com> 于2019年1月5日周六 上午12:23写道:

> kill_thanos branch in thulab/iotdb has refactored most features as below.
>
>
> 1. Replacing the single point calculation logic with a batch data load
> behavior.
> In previous branch, the most important two methods in the `Reader` of
> IoTDB are `hasNext` and `next` methods, which examine that whether the
> given query series has next point and calculate next point. Multiple
> invoking of these two methods decreasing the performance of query, so we
> added two new methods `hasNextBatch` and `nextBatch`. As a result, we will
> load and transfer data in batch rather than a single point. These two
> methods are friendly to CPU.
>
>
> 2. Using nio.
> In this branch, we replaced ByteArrayInputStream with NIO, taking the
> advantage of java NIO. We used `Channel`, `Buffer`, `MMap` more frequently.
>
>
> 3. Adding file stream manager.
> In a query of IoTDB, multiple series may be queried, such as a sql `select
> * from root.vehicle`. To avoid opened one tsfile multiple times, we
> adopting a file stream manager, which ensure that one file will be opened
> at most once in IoTDB queries. We adopt an `ExpiredTimeMap` to manage
> opened file streams, and close some files when they are not used for a
> given expired time.  Maybe there are better file stream reader management
> methods, I will keep trace it.
>
>
> 4. Optimizing filter efficiency.
> Firstly, we removed the previously `Visitor Pattern` implementation of
> filter, and adopted an intuitive implementation.
> Secondly,  we optimized some filter logic to promote performance. For
> example, in a sql `select sensor_0, sensor_1 from device_0 where sensor_1 >
> 10`, we did some optimization to avoid  the duplicate data calculation of
> `sensor_1`.
>
>
> 5. Others, such as removing serialization of thrift, changing the file
> format of TsFile, maybe someone else can make a supplement.
>
>
> I suggest that merging it into master branch in the next week.
>
>
> Experimental results show that the query test in kill_thanos branch has
> approximately 30% ~ 60% performance promotion.
>
>
> By the way, I am considering that how to get a standard, convincing test
> data (in IoT domain) to test the writing and querying performance of
> IoTDB.  Currently, we just use the data generated by `IoTDB Benchmark`
> (another project, also available on github.com/thulab/iotdb-benchmark),
> which generated 10w row records of 100device * 100sensor.
>
>
> Thanks & Best Regards
>
>
> -----------------------------------
> Cao Gaofei (曹高飞)
> School of Software,
> Tsinghua University
> -----------------------------------