You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@iotdb.apache.org by Steve Su <st...@qq.com> on 2020/10/26 02:14:29 UTC

The new Gorilla encoding algorithm

Hi,

The new Gorilla encoding algorithm is now ready. PR's here[1].

Compared with the old implementation, the new implementation has the following advantages:
1) New types are supported: INT32 and INT64 (FLOAT and DOUBLE are already supported by the old implementation)
2) About 4x faster when encoding and decoding
3) The size of the encoded-data is reduced by about 20%
Here's the performance report: [2].

Please have a review :D

Thanks,
Steve

[1] https://github.com/apache/iotdb/pull/1856
[2] https://cwiki.apache.org/confluence/display/IOTDB/IOTDB-938+Re-implement+Gorilla+encoding+algorithm

------------------ Original ------------------
From: "Steve Su" <st...@qq.com>;
Date: Sat, Oct 10, 2020 10:20 PM
To: "dev"<de...@iotdb.apache.org>;
Subject: Share some experiment results about Gorilla encoding algorithm

Hi,

Recently, we realized that the Gorilla encoding algorithm that has been used inside IoTDB may have some issues, because it will cause time series data (the value part) to become more space-consuming after encoding. This is not in line with expectations. Usually after using Gorilla encoding, the data will take up less space.

I found a very good open source Gorilla algorithm implementation by Michael on Github (see https://github.com/burmanm/gorilla-tsc). I compared the difference in encoding / decoding time cost and compression rate between the version implemented by Michael and the version used internally by IoTDB, and found that the version used inside IoTDB does have a lot of room for improvement.

See https://cwiki.apache.org/confluence/display/IOTDB/Gorilla+encoding+algorithm for more experiment details.

I think we can refer to Michael's implementation to re-implement the algorithm inside IoTDB to reduce the compression rate (fix potential errors) and improve performance. I have created a JIRA (see https://issues.apache.org/jira/browse/IOTDB-938) for this. If possible, I would be happy to re-implement the algorithm.

Thanks,
Steve Su