You are viewing a plain text version of this content. The canonical link for it is here.

Posted to pr@cassandra.apache.org by GitBox <gi...@apache.org> on 2019/11/11 23:36:24 UTC

[GitHub] [cassandra] yifan-c opened a new pull request #382: Estimate UTF-8 string size based on encodeSize and add benchmarks

yifan-c opened a new pull request #382: Estimate UTF-8 string size based on encodeSize and add benchmarks
URL: https://github.com/apache/cassandra/pull/382

Given the fact that the `encodeSize` was calculated already when encoding, we can leverage the size and safely reserve the remaining capacity for writing to avoid resizing.

A set of benchmarks were taken to show the difference. For the long text, the change halves the string encoding time from 571.9 ns to 216.1 ns. The time is almost halves for the short text as well.

The improvement is because of avoiding the unnecessary resizing and data copy.

```
[java] Benchmark Mode Cnt Score Error Units
[java] Utf8StringEncodeBench.writeLongText avgt 6 571.949 ± 19.791 ns/op
[java] Utf8StringEncodeBench.writeLongTextWithExactSize avgt 6 459.932 ± 27.790 ns/op
[java] Utf8StringEncodeBench.writeLongTextWithExactSizeSkipCalc avgt 6 216.085 ± 3.480 ns/op
[java] Utf8StringEncodeBench.writeShortText avgt 6 62.775 ± 6.159 ns/op
[java] Utf8StringEncodeBench.writeShortTextWithExactSize avgt 6 44.071 ± 5.645 ns/op
[java] Utf8StringEncodeBench.writeShortTextWithExactSizeSkipCalc avgt 6 36.358 ± 5.135 ns/op
````

- writeLongText: the original implementation that calls `ByteBufUtils.writeUtf8`. It over-estimates the size of string that causes resizing the buffer.
- writeLongTextWithExactSize: calls `TypeSizes.encodeUTF8Length` to reserve the exact size of bytes to write.
- writeLongTextWithExactSizeSkipCalc: optimize by removing calculating the UTF8 length. Because we calculated the encodeSize before encode for messages. Therefore, the size of the final bytes is known, we can leverage this information to just reserve using the remaining capacity.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org