You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@iotdb.apache.org by Steve Su <st...@qq.com> on 2020/10/10 14:20:24 UTC

Share some experiment results about Gorilla encoding algorithm

Hi,

Recently, we realized that the Gorilla encoding algorithm that has been used inside IoTDB may have some issues, because it will cause time series data (the value part) to become more space-consuming after encoding. This is not in line with expectations. Usually after using Gorilla encoding, the data will take up less space.

I found a very good open source Gorilla algorithm implementation by Michael on Github (see https://github.com/burmanm/gorilla-tsc). I compared the difference in encoding / decoding time cost and compression rate between the version implemented by Michael and the version used internally by IoTDB, and found that the version used inside IoTDB does have a lot of room for improvement.

See https://cwiki.apache.org/confluence/display/IOTDB/Gorilla+encoding+algorithm for more experiment details.

I think we can refer to Michael's implementation to re-implement the algorithm inside IoTDB to reduce the compression rate (fix potential errors) and improve performance. I have created a JIRA (see https://issues.apache.org/jira/browse/IOTDB-938) for this. If possible, I would be happy to re-implement the algorithm.

Thanks,
Steve Su

Re: Share some experiment results about Gorilla encoding algorithm

Posted by Xiangdong Huang <sa...@gmail.com>.
Hi,

> I think we can change the name of the old Gorilla encoding to
TSEncoding.OLD_GORILLA in the code under the premise of ensuring the
compatibility of the old TsFiles, and then reserve TSEncoding.GORILLA for
the re-implemented version. This may minimize the impact on users.

I opt for this way. Old_Gorillia still can be serialized as "6". And then
we assign a new short value to the new gorilla.

Best,
-----------------------------------
Xiangdong Huang
School of Software, Tsinghua University

 黄向东
清华大学 软件学院


Steve Su <st...@qq.com> 于2020年10月11日周日 下午11:53写道:

> Hi,
>
> From my point of view, since the reimplementation of this algorithm does
> not change the structure of TsFile, there is no need to upgrade the version
> number of TsFile to 000003.
>
> I think we can change the name of the old Gorilla encoding to
> TSEncoding.OLD_GORILLA in the code under the premise of ensuring the
> compatibility of the old TsFiles, and then reserve TSEncoding.GORILLA for
> the re-implemented version. This may minimize the impact on users.
>
> What do you think? :)
>
> Steve Su
>
> ------------------ 原始邮件 ------------------
> 发件人: "dev" <sa...@gmail.com>;
> 发送时间: 2020年10月10日(星期六) 晚上11:35
> 收件人: "dev"<de...@iotdb.apache.org>;
> 主题: Re: Share some experiment results about Gorilla encoding algorithm
>
> Hi,
>
> Nice!
>
> One question. So, if we reimplement the Gorilla algorithm, how to consider
> the version compatibility?
>
> 1. Upgrade the TsFile version to 000003, or
> 2. Add a new encoding name to the corrected gorilla.
>
> Best,
> -----------------------------------
> Xiangdong Huang
> School of Software, Tsinghua University
>
>  黄向东
> 清华大学 软件学院
>
>
> Steve Su <st...@qq.com> 于2020年10月10日周六 下午10:20写道:
>
> > Hi,
> >
> > Recently, we realized that the Gorilla encoding algorithm that has been
> > used inside IoTDB may have some issues, because it will cause time series
> > data (the value part) to become more space-consuming after encoding. This
> > is not in line with expectations. Usually after using Gorilla encoding,
> the
> > data will take up less space.
> >
> > I found a very good open source Gorilla algorithm implementation by
> > Michael on Github (see https://github.com/burmanm/gorilla-tsc). I
> > compared the difference in encoding / decoding time cost and compression
> > rate between the version implemented by Michael and the version used
> > internally by IoTDB, and found that the version used inside IoTDB does
> have
> > a lot of room for improvement.
> >
> > See
> >
> https://cwiki.apache.org/confluence/display/IOTDB/Gorilla+encoding+algorithm
> > for more experiment details.
> >
> > I think we can refer to Michael's implementation to re-implement the
> > algorithm inside IoTDB to reduce the compression rate (fix potential
> > errors) and improve performance. I have created a JIRA (see
> > https://issues.apache.org/jira/browse/IOTDB-938) for this. If possible,
> I
> > would be happy to re-implement the algorithm.
> >
> > Thanks,
> > Steve Su

Re: Share some experiment results about Gorilla encoding algorithm

Posted by Steve Su <st...@qq.com>.
Hi,

I totally agree with Chris.

We can use TSEncoding.GORILLA_V1 and TSEncoding.GORILLA_V2 to represent the two versions of Gorilla algorithm implementation. When the user specifies Gorilla encoding to create a time series, we can always select the latest version of the encoding for the user.

Steve Su

------------------ 原始邮件 ------------------
发件人: "dev" <hh...@outlook.com>;
发送时间: 2020年10月12日(星期一) 下午2:52
收件人: "dev-iotdb"<de...@iotdb.apache.org>;
主题: Re: Share some experiment results about Gorilla encoding algorithm

Hi,

Version number +1

When I was doing the tsfile upgrading module, I changed a lot of javadoc about Old and New TsFile, which made me so confused, to v1 and v2.

Thanks,

Haonan Hou

> On Oct 12, 2020, at 2:42 PM, Jialin Qiao <qj...@mails.tsinghua.edu.cn> wrote:
> 
> Hi,
> 
> +1 for version number :)
> 
> Thanks,
> --
> Jialin Qiao
> School of Software, Tsinghua University
> 
> 乔嘉林
> 清华大学 软件学院
> 
>> -----原始邮件-----
>> 发件人: "Christofer Dutz" <ch...@c-ware.de>
>> 发送时间: 2020-10-12 14:38:34 (星期一)
>> 收件人: "dev@iotdb.apache.org" <de...@iotdb.apache.org>
>> 抄送: 
>> 主题: Re: Share some experiment results about Gorilla encoding algorithm
>> 
>> Whatever you do : don't call anything "old" or "new".
>> 
>> In two years the new "new" might be the new "old"... What happens then?... Append version numbers... That's sustainable...
>> 
>> Chris
>> ________________________________
>> Von: Jialin Qiao <qj...@mails.tsinghua.edu.cn>
>> Gesendet: Montag, 12. Oktober 2020 08:32
>> An: dev@iotdb.apache.org <de...@iotdb.apache.org>
>> Betreff: Re: Share some experiment results about Gorilla encoding algorithm
>> 
>> Hi,
>> 
>> Maintaining two versions of gorilla encoding is ok.
>> 
>> Could we change the default time encoding from TS2_DIFF to Gorilla and keep compatible?
>> 
>> Thanks,
>> --
>> Jialin Qiao
>> School of Software, Tsinghua University
>> 
>> 乔嘉林
>> 清华大学 软件学院
>> 
>>> -----原始邮件-----
>>> 发件人: "Steve Su" <st...@qq.com>
>>> 发送时间: 2020-10-11 23:52:55 (星期日)
>>> 收件人: dev <de...@iotdb.apache.org>
>>> 抄送:
>>> 主题: Re: Share some experiment results about Gorilla encoding algorithm
>>> 
>>> Hi,
>>> 
>>> From my point of view, since the reimplementation of this algorithm does not change the structure of TsFile, there is no need to upgrade the version number of TsFile to 000003.
>>> 
>>> I think we can change the name of the old Gorilla encoding to TSEncoding.OLD_GORILLA in the code under the premise of ensuring the compatibility of the old TsFiles, and then reserve TSEncoding.GORILLA for the re-implemented version. This may minimize the impact on users.
>>> 
>>> What do you think? :)
>>> 
>>> Steve Su
>>> 
>>> ------------------ 原始邮件 ------------------
>>> 发件人: "dev" <sa...@gmail.com>;
>>> 发送时间: 2020年10月10日(星期六) 晚上11:35
>>> 收件人: "dev"<de...@iotdb.apache.org>;
>>> 主题: Re: Share some experiment results about Gorilla encoding algorithm
>>> 
>>> Hi,
>>> 
>>> Nice!
>>> 
>>> One question. So, if we reimplement the Gorilla algorithm, how to consider
>>> the version compatibility?
>>> 
>>> 1. Upgrade the TsFile version to 000003, or
>>> 2. Add a new encoding name to the corrected gorilla.
>>> 
>>> Best,
>>> -----------------------------------
>>> Xiangdong Huang
>>> School of Software, Tsinghua University
>>> 
>>> 黄向东
>>> 清华大学 软件学院
>>> 
>>> 
>>> Steve Su <st...@qq.com> 于2020年10月10日周六 下午10:20写道:
>>> 
>>>> Hi,
>>>> 
>>>> Recently, we realized that the Gorilla encoding algorithm that has been
>>>> used inside IoTDB may have some issues, because it will cause time series
>>>> data (the value part) to become more space-consuming after encoding. This
>>>> is not in line with expectations. Usually after using Gorilla encoding, the
>>>> data will take up less space.
>>>> 
>>>> I found a very good open source Gorilla algorithm implementation by
>>>> Michael on Github (see https://github.com/burmanm/gorilla-tsc). I
>>>> compared the difference in encoding / decoding time cost and compression
>>>> rate between the version implemented by Michael and the version used
>>>> internally by IoTDB, and found that the version used inside IoTDB does have
>>>> a lot of room for improvement.
>>>> 
>>>> See
>>>> https://cwiki.apache.org/confluence/display/IOTDB/Gorilla+encoding+algorithm
>>>> for more experiment details.
>>>> 
>>>> I think we can refer to Michael's implementation to re-implement the
>>>> algorithm inside IoTDB to reduce the compression rate (fix potential
>>>> errors) and improve performance. I have created a JIRA (see
>>>> https://issues.apache.org/jira/browse/IOTDB-938) for this. If possible, I
>>>> would be happy to re-implement the algorithm.
>>>> 
>>>> Thanks,
>>>> Steve Su

Re: Share some experiment results about Gorilla encoding algorithm

Posted by Haonan Hou <hh...@outlook.com>.
Hi,

Version number +1

When I was doing the tsfile upgrading module, I changed a lot of javadoc about Old and New TsFile, which made me so confused, to v1 and v2.

Thanks,

Haonan Hou

> On Oct 12, 2020, at 2:42 PM, Jialin Qiao <qj...@mails.tsinghua.edu.cn> wrote:
> 
> Hi,
> 
> +1 for version number :)
> 
> Thanks,
> --
> Jialin Qiao
> School of Software, Tsinghua University
> 
> 乔嘉林
> 清华大学 软件学院
> 
>> -----原始邮件-----
>> 发件人: "Christofer Dutz" <ch...@c-ware.de>
>> 发送时间: 2020-10-12 14:38:34 (星期一)
>> 收件人: "dev@iotdb.apache.org" <de...@iotdb.apache.org>
>> 抄送: 
>> 主题: Re: Share some experiment results about Gorilla encoding algorithm
>> 
>> Whatever you do : don't call anything "old" or "new".
>> 
>> In two years the new "new" might be the new "old"... What happens then?... Append version numbers... That's sustainable...
>> 
>> Chris
>> ________________________________
>> Von: Jialin Qiao <qj...@mails.tsinghua.edu.cn>
>> Gesendet: Montag, 12. Oktober 2020 08:32
>> An: dev@iotdb.apache.org <de...@iotdb.apache.org>
>> Betreff: Re: Share some experiment results about Gorilla encoding algorithm
>> 
>> Hi,
>> 
>> Maintaining two versions of gorilla encoding is ok.
>> 
>> Could we change the default time encoding from TS2_DIFF to Gorilla and keep compatible?
>> 
>> Thanks,
>> --
>> Jialin Qiao
>> School of Software, Tsinghua University
>> 
>> 乔嘉林
>> 清华大学 软件学院
>> 
>>> -----原始邮件-----
>>> 发件人: "Steve Su" <st...@qq.com>
>>> 发送时间: 2020-10-11 23:52:55 (星期日)
>>> 收件人: dev <de...@iotdb.apache.org>
>>> 抄送:
>>> 主题: Re: Share some experiment results about Gorilla encoding algorithm
>>> 
>>> Hi,
>>> 
>>> From my point of view, since the reimplementation of this algorithm does not change the structure of TsFile, there is no need to upgrade the version number of TsFile to 000003.
>>> 
>>> I think we can change the name of the old Gorilla encoding to TSEncoding.OLD_GORILLA in the code under the premise of ensuring the compatibility of the old TsFiles, and then reserve TSEncoding.GORILLA for the re-implemented version. This may minimize the impact on users.
>>> 
>>> What do you think? :)
>>> 
>>> Steve Su
>>> 
>>> ------------------ 原始邮件 ------------------
>>> 发件人: "dev" <sa...@gmail.com>;
>>> 发送时间: 2020年10月10日(星期六) 晚上11:35
>>> 收件人: "dev"<de...@iotdb.apache.org>;
>>> 主题: Re: Share some experiment results about Gorilla encoding algorithm
>>> 
>>> Hi,
>>> 
>>> Nice!
>>> 
>>> One question. So, if we reimplement the Gorilla algorithm, how to consider
>>> the version compatibility?
>>> 
>>> 1. Upgrade the TsFile version to 000003, or
>>> 2. Add a new encoding name to the corrected gorilla.
>>> 
>>> Best,
>>> -----------------------------------
>>> Xiangdong Huang
>>> School of Software, Tsinghua University
>>> 
>>> 黄向东
>>> 清华大学 软件学院
>>> 
>>> 
>>> Steve Su <st...@qq.com> 于2020年10月10日周六 下午10:20写道:
>>> 
>>>> Hi,
>>>> 
>>>> Recently, we realized that the Gorilla encoding algorithm that has been
>>>> used inside IoTDB may have some issues, because it will cause time series
>>>> data (the value part) to become more space-consuming after encoding. This
>>>> is not in line with expectations. Usually after using Gorilla encoding, the
>>>> data will take up less space.
>>>> 
>>>> I found a very good open source Gorilla algorithm implementation by
>>>> Michael on Github (see https://github.com/burmanm/gorilla-tsc). I
>>>> compared the difference in encoding / decoding time cost and compression
>>>> rate between the version implemented by Michael and the version used
>>>> internally by IoTDB, and found that the version used inside IoTDB does have
>>>> a lot of room for improvement.
>>>> 
>>>> See
>>>> https://cwiki.apache.org/confluence/display/IOTDB/Gorilla+encoding+algorithm
>>>> for more experiment details.
>>>> 
>>>> I think we can refer to Michael's implementation to re-implement the
>>>> algorithm inside IoTDB to reduce the compression rate (fix potential
>>>> errors) and improve performance. I have created a JIRA (see
>>>> https://issues.apache.org/jira/browse/IOTDB-938) for this. If possible, I
>>>> would be happy to re-implement the algorithm.
>>>> 
>>>> Thanks,
>>>> Steve Su


Re: Share some experiment results about Gorilla encoding algorithm

Posted by Jialin Qiao <qj...@mails.tsinghua.edu.cn>.
Hi,

+1 for version number :)

Thanks,
--
Jialin Qiao
School of Software, Tsinghua University

乔嘉林
清华大学 软件学院

> -----原始邮件-----
> 发件人: "Christofer Dutz" <ch...@c-ware.de>
> 发送时间: 2020-10-12 14:38:34 (星期一)
> 收件人: "dev@iotdb.apache.org" <de...@iotdb.apache.org>
> 抄送: 
> 主题: Re: Share some experiment results about Gorilla encoding algorithm
> 
> Whatever you do : don't call anything "old" or "new".
> 
> In two years the new "new" might be the new "old"... What happens then?... Append version numbers... That's sustainable...
> 
> Chris
> ________________________________
> Von: Jialin Qiao <qj...@mails.tsinghua.edu.cn>
> Gesendet: Montag, 12. Oktober 2020 08:32
> An: dev@iotdb.apache.org <de...@iotdb.apache.org>
> Betreff: Re: Share some experiment results about Gorilla encoding algorithm
> 
> Hi,
> 
> Maintaining two versions of gorilla encoding is ok.
> 
> Could we change the default time encoding from TS2_DIFF to Gorilla and keep compatible?
> 
> Thanks,
> --
> Jialin Qiao
> School of Software, Tsinghua University
> 
> 乔嘉林
> 清华大学 软件学院
> 
> > -----原始邮件-----
> > 发件人: "Steve Su" <st...@qq.com>
> > 发送时间: 2020-10-11 23:52:55 (星期日)
> > 收件人: dev <de...@iotdb.apache.org>
> > 抄送:
> > 主题: Re: Share some experiment results about Gorilla encoding algorithm
> >
> > Hi,
> >
> > From my point of view, since the reimplementation of this algorithm does not change the structure of TsFile, there is no need to upgrade the version number of TsFile to 000003.
> >
> > I think we can change the name of the old Gorilla encoding to TSEncoding.OLD_GORILLA in the code under the premise of ensuring the compatibility of the old TsFiles, and then reserve TSEncoding.GORILLA for the re-implemented version. This may minimize the impact on users.
> >
> > What do you think? :)
> >
> > Steve Su
> >
> > ------------------ 原始邮件 ------------------
> > 发件人: "dev" <sa...@gmail.com>;
> > 发送时间: 2020年10月10日(星期六) 晚上11:35
> > 收件人: "dev"<de...@iotdb.apache.org>;
> > 主题: Re: Share some experiment results about Gorilla encoding algorithm
> >
> > Hi,
> >
> > Nice!
> >
> > One question. So, if we reimplement the Gorilla algorithm, how to consider
> > the version compatibility?
> >
> > 1. Upgrade the TsFile version to 000003, or
> > 2. Add a new encoding name to the corrected gorilla.
> >
> > Best,
> > -----------------------------------
> > Xiangdong Huang
> > School of Software, Tsinghua University
> >
> >  黄向东
> > 清华大学 软件学院
> >
> >
> > Steve Su <st...@qq.com> 于2020年10月10日周六 下午10:20写道:
> >
> > > Hi,
> > >
> > > Recently, we realized that the Gorilla encoding algorithm that has been
> > > used inside IoTDB may have some issues, because it will cause time series
> > > data (the value part) to become more space-consuming after encoding. This
> > > is not in line with expectations. Usually after using Gorilla encoding, the
> > > data will take up less space.
> > >
> > > I found a very good open source Gorilla algorithm implementation by
> > > Michael on Github (see https://github.com/burmanm/gorilla-tsc). I
> > > compared the difference in encoding / decoding time cost and compression
> > > rate between the version implemented by Michael and the version used
> > > internally by IoTDB, and found that the version used inside IoTDB does have
> > > a lot of room for improvement.
> > >
> > > See
> > > https://cwiki.apache.org/confluence/display/IOTDB/Gorilla+encoding+algorithm
> > > for more experiment details.
> > >
> > > I think we can refer to Michael's implementation to re-implement the
> > > algorithm inside IoTDB to reduce the compression rate (fix potential
> > > errors) and improve performance. I have created a JIRA (see
> > > https://issues.apache.org/jira/browse/IOTDB-938) for this. If possible, I
> > > would be happy to re-implement the algorithm.
> > >
> > > Thanks,
> > > Steve Su

Re: Share some experiment results about Gorilla encoding algorithm

Posted by Christofer Dutz <ch...@c-ware.de>.
Whatever you do : don't call anything "old" or "new".

In two years the new "new" might be the new "old"... What happens then?... Append version numbers... That's sustainable...

Chris
________________________________
Von: Jialin Qiao <qj...@mails.tsinghua.edu.cn>
Gesendet: Montag, 12. Oktober 2020 08:32
An: dev@iotdb.apache.org <de...@iotdb.apache.org>
Betreff: Re: Share some experiment results about Gorilla encoding algorithm

Hi,

Maintaining two versions of gorilla encoding is ok.

Could we change the default time encoding from TS2_DIFF to Gorilla and keep compatible?

Thanks,
--
Jialin Qiao
School of Software, Tsinghua University

乔嘉林
清华大学 软件学院

> -----原始邮件-----
> 发件人: "Steve Su" <st...@qq.com>
> 发送时间: 2020-10-11 23:52:55 (星期日)
> 收件人: dev <de...@iotdb.apache.org>
> 抄送:
> 主题: Re: Share some experiment results about Gorilla encoding algorithm
>
> Hi,
>
> From my point of view, since the reimplementation of this algorithm does not change the structure of TsFile, there is no need to upgrade the version number of TsFile to 000003.
>
> I think we can change the name of the old Gorilla encoding to TSEncoding.OLD_GORILLA in the code under the premise of ensuring the compatibility of the old TsFiles, and then reserve TSEncoding.GORILLA for the re-implemented version. This may minimize the impact on users.
>
> What do you think? :)
>
> Steve Su
>
> ------------------ 原始邮件 ------------------
> 发件人: "dev" <sa...@gmail.com>;
> 发送时间: 2020年10月10日(星期六) 晚上11:35
> 收件人: "dev"<de...@iotdb.apache.org>;
> 主题: Re: Share some experiment results about Gorilla encoding algorithm
>
> Hi,
>
> Nice!
>
> One question. So, if we reimplement the Gorilla algorithm, how to consider
> the version compatibility?
>
> 1. Upgrade the TsFile version to 000003, or
> 2. Add a new encoding name to the corrected gorilla.
>
> Best,
> -----------------------------------
> Xiangdong Huang
> School of Software, Tsinghua University
>
>  黄向东
> 清华大学 软件学院
>
>
> Steve Su <st...@qq.com> 于2020年10月10日周六 下午10:20写道:
>
> > Hi,
> >
> > Recently, we realized that the Gorilla encoding algorithm that has been
> > used inside IoTDB may have some issues, because it will cause time series
> > data (the value part) to become more space-consuming after encoding. This
> > is not in line with expectations. Usually after using Gorilla encoding, the
> > data will take up less space.
> >
> > I found a very good open source Gorilla algorithm implementation by
> > Michael on Github (see https://github.com/burmanm/gorilla-tsc). I
> > compared the difference in encoding / decoding time cost and compression
> > rate between the version implemented by Michael and the version used
> > internally by IoTDB, and found that the version used inside IoTDB does have
> > a lot of room for improvement.
> >
> > See
> > https://cwiki.apache.org/confluence/display/IOTDB/Gorilla+encoding+algorithm
> > for more experiment details.
> >
> > I think we can refer to Michael's implementation to re-implement the
> > algorithm inside IoTDB to reduce the compression rate (fix potential
> > errors) and improve performance. I have created a JIRA (see
> > https://issues.apache.org/jira/browse/IOTDB-938) for this. If possible, I
> > would be happy to re-implement the algorithm.
> >
> > Thanks,
> > Steve Su

Re: Share some experiment results about Gorilla encoding algorithm

Posted by Steve Su <st...@qq.com>.
Hi,

> Could we change the default time encoding from TS2_DIFF to Gorilla and keep compatible?
Yes. 

In the version implemented by Michael, the time encoding is essentially a Delta of Delta encoding (similar to TS2_DIFF, but with some improvements). We can reimplement TS2_DIFF based on Michael's implementation and name the two encodings TS2_DIFF_V1 and TS2_DIFF_V2.

Steve Su

------------------ 原始邮件 ------------------
发件人: "dev" <qj...@mails.tsinghua.edu.cn>;
发送时间: 2020年10月12日(星期一) 下午2:32
收件人: "dev"<de...@iotdb.apache.org>;
主题: Re: Share some experiment results about Gorilla encoding algorithm

Hi,

Maintaining two versions of gorilla encoding is ok.

Could we change the default time encoding from TS2_DIFF to Gorilla and keep compatible?

Thanks,
--
Jialin Qiao
School of Software, Tsinghua University

乔嘉林
清华大学 软件学院

> -----原始邮件-----
> 发件人: "Steve Su" <st...@qq.com>
> 发送时间: 2020-10-11 23:52:55 (星期日)
> 收件人: dev <de...@iotdb.apache.org>
> 抄送: 
> 主题: Re: Share some experiment results about Gorilla encoding algorithm
> 
> Hi,
> 
> From my point of view, since the reimplementation of this algorithm does not change the structure of TsFile, there is no need to upgrade the version number of TsFile to 000003.
> 
> I think we can change the name of the old Gorilla encoding to TSEncoding.OLD_GORILLA in the code under the premise of ensuring the compatibility of the old TsFiles, and then reserve TSEncoding.GORILLA for the re-implemented version. This may minimize the impact on users.
> 
> What do you think? :)
> 
> Steve Su
> 
> ------------------ 原始邮件 ------------------
> 发件人: "dev" <sa...@gmail.com>;
> 发送时间: 2020年10月10日(星期六) 晚上11:35
> 收件人: "dev"<de...@iotdb.apache.org>;
> 主题: Re: Share some experiment results about Gorilla encoding algorithm
> 
> Hi,
> 
> Nice!
> 
> One question. So, if we reimplement the Gorilla algorithm, how to consider
> the version compatibility?
> 
> 1. Upgrade the TsFile version to 000003, or
> 2. Add a new encoding name to the corrected gorilla.
> 
> Best,
> -----------------------------------
> Xiangdong Huang
> School of Software, Tsinghua University
> 
>  黄向东
> 清华大学 软件学院
> 
> 
> Steve Su <st...@qq.com> 于2020年10月10日周六 下午10:20写道:
> 
> > Hi,
> >
> > Recently, we realized that the Gorilla encoding algorithm that has been
> > used inside IoTDB may have some issues, because it will cause time series
> > data (the value part) to become more space-consuming after encoding. This
> > is not in line with expectations. Usually after using Gorilla encoding, the
> > data will take up less space.
> >
> > I found a very good open source Gorilla algorithm implementation by
> > Michael on Github (see https://github.com/burmanm/gorilla-tsc). I
> > compared the difference in encoding / decoding time cost and compression
> > rate between the version implemented by Michael and the version used
> > internally by IoTDB, and found that the version used inside IoTDB does have
> > a lot of room for improvement.
> >
> > See
> > https://cwiki.apache.org/confluence/display/IOTDB/Gorilla+encoding+algorithm
> > for more experiment details.
> >
> > I think we can refer to Michael's implementation to re-implement the
> > algorithm inside IoTDB to reduce the compression rate (fix potential
> > errors) and improve performance. I have created a JIRA (see
> > https://issues.apache.org/jira/browse/IOTDB-938) for this. If possible, I
> > would be happy to re-implement the algorithm.
> >
> > Thanks,
> > Steve Su

Re: Share some experiment results about Gorilla encoding algorithm

Posted by Jialin Qiao <qj...@mails.tsinghua.edu.cn>.
Hi,

Maintaining two versions of gorilla encoding is ok.

Could we change the default time encoding from TS2_DIFF to Gorilla and keep compatible?

Thanks,
--
Jialin Qiao
School of Software, Tsinghua University

乔嘉林
清华大学 软件学院

> -----原始邮件-----
> 发件人: "Steve Su" <st...@qq.com>
> 发送时间: 2020-10-11 23:52:55 (星期日)
> 收件人: dev <de...@iotdb.apache.org>
> 抄送: 
> 主题: Re: Share some experiment results about Gorilla encoding algorithm
> 
> Hi,
> 
> From my point of view, since the reimplementation of this algorithm does not change the structure of TsFile, there is no need to upgrade the version number of TsFile to 000003.
> 
> I think we can change the name of the old Gorilla encoding to TSEncoding.OLD_GORILLA in the code under the premise of ensuring the compatibility of the old TsFiles, and then reserve TSEncoding.GORILLA for the re-implemented version. This may minimize the impact on users.
> 
> What do you think? :)
> 
> Steve Su
> 
> ------------------ 原始邮件 ------------------
> 发件人: "dev" <sa...@gmail.com>;
> 发送时间: 2020年10月10日(星期六) 晚上11:35
> 收件人: "dev"<de...@iotdb.apache.org>;
> 主题: Re: Share some experiment results about Gorilla encoding algorithm
> 
> Hi,
> 
> Nice!
> 
> One question. So, if we reimplement the Gorilla algorithm, how to consider
> the version compatibility?
> 
> 1. Upgrade the TsFile version to 000003, or
> 2. Add a new encoding name to the corrected gorilla.
> 
> Best,
> -----------------------------------
> Xiangdong Huang
> School of Software, Tsinghua University
> 
>  黄向东
> 清华大学 软件学院
> 
> 
> Steve Su <st...@qq.com> 于2020年10月10日周六 下午10:20写道:
> 
> > Hi,
> >
> > Recently, we realized that the Gorilla encoding algorithm that has been
> > used inside IoTDB may have some issues, because it will cause time series
> > data (the value part) to become more space-consuming after encoding. This
> > is not in line with expectations. Usually after using Gorilla encoding, the
> > data will take up less space.
> >
> > I found a very good open source Gorilla algorithm implementation by
> > Michael on Github (see https://github.com/burmanm/gorilla-tsc). I
> > compared the difference in encoding / decoding time cost and compression
> > rate between the version implemented by Michael and the version used
> > internally by IoTDB, and found that the version used inside IoTDB does have
> > a lot of room for improvement.
> >
> > See
> > https://cwiki.apache.org/confluence/display/IOTDB/Gorilla+encoding+algorithm
> > for more experiment details.
> >
> > I think we can refer to Michael's implementation to re-implement the
> > algorithm inside IoTDB to reduce the compression rate (fix potential
> > errors) and improve performance. I have created a JIRA (see
> > https://issues.apache.org/jira/browse/IOTDB-938) for this. If possible, I
> > would be happy to re-implement the algorithm.
> >
> > Thanks,
> > Steve Su

Re: Share some experiment results about Gorilla encoding algorithm

Posted by Steve Su <st...@qq.com>.
Hi,

From my point of view, since the reimplementation of this algorithm does not change the structure of TsFile, there is no need to upgrade the version number of TsFile to 000003.

I think we can change the name of the old Gorilla encoding to TSEncoding.OLD_GORILLA in the code under the premise of ensuring the compatibility of the old TsFiles, and then reserve TSEncoding.GORILLA for the re-implemented version. This may minimize the impact on users.

What do you think? :)

Steve Su

------------------ 原始邮件 ------------------
发件人: "dev" <sa...@gmail.com>;
发送时间: 2020年10月10日(星期六) 晚上11:35
收件人: "dev"<de...@iotdb.apache.org>;
主题: Re: Share some experiment results about Gorilla encoding algorithm

Hi,

Nice!

One question. So, if we reimplement the Gorilla algorithm, how to consider
the version compatibility?

1. Upgrade the TsFile version to 000003, or
2. Add a new encoding name to the corrected gorilla.

Best,
-----------------------------------
Xiangdong Huang
School of Software, Tsinghua University

 黄向东
清华大学 软件学院


Steve Su <st...@qq.com> 于2020年10月10日周六 下午10:20写道:

> Hi,
>
> Recently, we realized that the Gorilla encoding algorithm that has been
> used inside IoTDB may have some issues, because it will cause time series
> data (the value part) to become more space-consuming after encoding. This
> is not in line with expectations. Usually after using Gorilla encoding, the
> data will take up less space.
>
> I found a very good open source Gorilla algorithm implementation by
> Michael on Github (see https://github.com/burmanm/gorilla-tsc). I
> compared the difference in encoding / decoding time cost and compression
> rate between the version implemented by Michael and the version used
> internally by IoTDB, and found that the version used inside IoTDB does have
> a lot of room for improvement.
>
> See
> https://cwiki.apache.org/confluence/display/IOTDB/Gorilla+encoding+algorithm
> for more experiment details.
>
> I think we can refer to Michael's implementation to re-implement the
> algorithm inside IoTDB to reduce the compression rate (fix potential
> errors) and improve performance. I have created a JIRA (see
> https://issues.apache.org/jira/browse/IOTDB-938) for this. If possible, I
> would be happy to re-implement the algorithm.
>
> Thanks,
> Steve Su

Re: Share some experiment results about Gorilla encoding algorithm

Posted by Jialin Qiao <qj...@mails.tsinghua.edu.cn>.
Hi,

Good job!  The Gorilla implemented by Michael could be used to improve our TS2_DIFF and Gorilla encoding both.

I support to do this upgrade in TsFile version 000003.

Thanks,
--
Jialin Qiao
School of Software, Tsinghua University

乔嘉林
清华大学 软件学院

> -----原始邮件-----
> 发件人: "Xiangdong Huang" <sa...@gmail.com>
> 发送时间: 2020-10-10 23:35:31 (星期六)
> 收件人: dev <de...@iotdb.apache.org>
> 抄送: 
> 主题: Re: Share some experiment results about Gorilla encoding algorithm
> 
> Hi,
> 
> Nice!
> 
> One question. So, if we reimplement the Gorilla algorithm, how to consider
> the version compatibility?
> 
> 1. Upgrade the TsFile version to 000003, or
> 2. Add a new encoding name to the corrected gorilla.
> 
> Best,
> -----------------------------------
> Xiangdong Huang
> School of Software, Tsinghua University
> 
>  黄向东
> 清华大学 软件学院
> 
> 
> Steve Su <st...@qq.com> 于2020年10月10日周六 下午10:20写道:
> 
> > Hi,
> >
> > Recently, we realized that the Gorilla encoding algorithm that has been
> > used inside IoTDB may have some issues, because it will cause time series
> > data (the value part) to become more space-consuming after encoding. This
> > is not in line with expectations. Usually after using Gorilla encoding, the
> > data will take up less space.
> >
> > I found a very good open source Gorilla algorithm implementation by
> > Michael on Github (see https://github.com/burmanm/gorilla-tsc). I
> > compared the difference in encoding / decoding time cost and compression
> > rate between the version implemented by Michael and the version used
> > internally by IoTDB, and found that the version used inside IoTDB does have
> > a lot of room for improvement.
> >
> > See
> > https://cwiki.apache.org/confluence/display/IOTDB/Gorilla+encoding+algorithm
> > for more experiment details.
> >
> > I think we can refer to Michael's implementation to re-implement the
> > algorithm inside IoTDB to reduce the compression rate (fix potential
> > errors) and improve performance. I have created a JIRA (see
> > https://issues.apache.org/jira/browse/IOTDB-938) for this. If possible, I
> > would be happy to re-implement the algorithm.
> >
> > Thanks,
> > Steve Su

Re: Share some experiment results about Gorilla encoding algorithm

Posted by Xiangdong Huang <sa...@gmail.com>.
Hi,

Nice!

One question. So, if we reimplement the Gorilla algorithm, how to consider
the version compatibility?

1. Upgrade the TsFile version to 000003, or
2. Add a new encoding name to the corrected gorilla.

Best,
-----------------------------------
Xiangdong Huang
School of Software, Tsinghua University

 黄向东
清华大学 软件学院


Steve Su <st...@qq.com> 于2020年10月10日周六 下午10:20写道:

> Hi,
>
> Recently, we realized that the Gorilla encoding algorithm that has been
> used inside IoTDB may have some issues, because it will cause time series
> data (the value part) to become more space-consuming after encoding. This
> is not in line with expectations. Usually after using Gorilla encoding, the
> data will take up less space.
>
> I found a very good open source Gorilla algorithm implementation by
> Michael on Github (see https://github.com/burmanm/gorilla-tsc). I
> compared the difference in encoding / decoding time cost and compression
> rate between the version implemented by Michael and the version used
> internally by IoTDB, and found that the version used inside IoTDB does have
> a lot of room for improvement.
>
> See
> https://cwiki.apache.org/confluence/display/IOTDB/Gorilla+encoding+algorithm
> for more experiment details.
>
> I think we can refer to Michael's implementation to re-implement the
> algorithm inside IoTDB to reduce the compression rate (fix potential
> errors) and improve performance. I have created a JIRA (see
> https://issues.apache.org/jira/browse/IOTDB-938) for this. If possible, I
> would be happy to re-implement the algorithm.
>
> Thanks,
> Steve Su

The new Gorilla encoding algorithm

Posted by Steve Su <st...@qq.com>.
Hi,

The new Gorilla encoding algorithm is now ready. PR's here[1].

Compared with the old implementation, the new implementation has the following advantages:
1) New types are supported: INT32 and INT64 (FLOAT and DOUBLE are already supported by the old implementation)
2) About 4x faster when encoding and decoding
3) The size of the encoded-data is reduced by about 20%
Here's the performance report: [2].

Please have a review :D

Thanks,
Steve

[1] https://github.com/apache/iotdb/pull/1856
[2] https://cwiki.apache.org/confluence/display/IOTDB/IOTDB-938+Re-implement+Gorilla+encoding+algorithm

------------------ Original ------------------
From: "Steve Su" <st...@qq.com>;
Date: Sat, Oct 10, 2020 10:20 PM
To: "dev"<de...@iotdb.apache.org>;
Subject: Share some experiment results about Gorilla encoding algorithm

Hi,

Recently, we realized that the Gorilla encoding algorithm that has been used inside IoTDB may have some issues, because it will cause time series data (the value part) to become more space-consuming after encoding. This is not in line with expectations. Usually after using Gorilla encoding, the data will take up less space.

I found a very good open source Gorilla algorithm implementation by Michael on Github (see https://github.com/burmanm/gorilla-tsc). I compared the difference in encoding / decoding time cost and compression rate between the version implemented by Michael and the version used internally by IoTDB, and found that the version used inside IoTDB does have a lot of room for improvement.

See https://cwiki.apache.org/confluence/display/IOTDB/Gorilla+encoding+algorithm for more experiment details.

I think we can refer to Michael's implementation to re-implement the algorithm inside IoTDB to reduce the compression rate (fix potential errors) and improve performance. I have created a JIRA (see https://issues.apache.org/jira/browse/IOTDB-938) for this. If possible, I would be happy to re-implement the algorithm.

Thanks,
Steve Su