You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2019/11/29 11:32:54 UTC
[GitHub] [flink] shuttie opened a new pull request #10358: [FLINK-14346]
[serialization] faster implementation of StringValue writeString and
readString
shuttie opened a new pull request #10358: [FLINK-14346] [serialization] faster implementation of StringValue writeString and readString
URL: https://github.com/apache/flink/pull/10358
## What is the purpose of the change
This PR implements a set of performance optimizations for String serialization and de-serialization. While running a set of state-heavy streaming jobs, we noticed that Flink spends quite a lot of CPU time (~30-40%) doing String encoding and decoding in two places: while transferring messages between the nodes, and while loading and writing objects into the state store.
We did a simple benchmark of String read/write operations compared to a default JDK's DataOutput.writeUTF ano noted a significant performance difference between Flink implementation and the JDK one.
Performance difference was 4x on decoding and 2x on encoding for 16 symbol ascii strings.
```
[info] Benchmark (length) (stringType) Mode Cnt Score Error Units
[info] StringDeserializerBenchmark.deserializeDefault 16 ascii avgt 25 251.321 ± 3.251 ns/op
[info] StringDeserializerBenchmark.deserializeJDK 16 ascii avgt 25 77.147 ± 1.661 ns/op
[info] StringSerializerBenchmark.serializeDefault 16 ascii avgt 25 95.782 ± 0.261 ns/op
[info] StringSerializerBenchmark.serializeJDK 16 ascii avgt 25 50.786 ± 1.677 ns/op
```
For larger strings performance was degrading even more significant, 7x and 4x accordingly:
```
[info] Benchmark (length) (stringType) Mode Cnt Score Error Units
[info] StringDeserializerBenchmark.deserializeDefault 128 ascii avgt 25 1757.726 ± 3.312 ns/op
[info] StringDeserializerBenchmark.deserializeJDK 128 ascii avgt 25 263.445 ± 6.912 ns/op
[info] StringSerializerBenchmark.serializeDefault 128 ascii avgt 25 670.627 ± 2.807 ns/op
[info] StringSerializerBenchmark.serializeJDK 128 ascii avgt 25 151.789 ± 7.295 ns/op
```
But JDK DataOutput.writeUTF cannot be directly used as a serializer replacement because of it being incompatible with the current Flink binary format for strings. Also it is not able to write strings longer than 32kb, as it uses 2-byte length encoding.
But if you compare the difference in implementation between these two algorithms, the main important difference is intermediate buffering in JDK, compared to a iterative approach in Flink. This buffering allows HotSpot to do two important optimizations:
* be able to unroll the whole encoding/decoding loop
* as there is no data dependencies between characters anymore, CPU can achieve much higher internal parallelism and spend less time being stalled.
We made a simple [POC](https://github.com/shuttie/flink-string-serializer) which alters the implementation of `StringValue.writeString` and `StringValue.readString` in a way to introduce additional buffering, which significantly improves both encoding and decoding throughput on almost all the workloads.
There is also a property-based validation test suite which ensures that old and new serialization code produce exactly the same byte sequences, and doing a round-trip from old to new, new to old, and new to new produce the same results. We ran this suite with random data for ~1h and found no differences.
## Benchmark results
Full raw benchmark results are located [here](https://github.com/shuttie/flink-string-serializer/blob/master/README.txt). We did a series of benchmarks of three string encoding/decoding impmentations:
* Flink's current `StringValue.writeString` and `StringValue.readString`
* `StringValue.writeString` and `StringValue.readString` implementations from this PR
* baseline binary-incompatible implementation of `DataInput.readUTF` and `DataOutput.writeUTF`
For a workload generator we used these parameters:
* string types: 7-bit us-ascii, russian symbols (which are usually encoded as a 14-bit varlen numbers), chinese symbols (which are usually within 21-bit varlen number range)
* string lengths: 1, 4, 8, 16, 32, 64, 128 characters.
### Ascii strings
```
[info] Benchmark (length) (stringType) Mode Cnt Score Error Units
[info] StringDeserializerBenchmark.deserializeDefault 1 ascii avgt 25 46.603 ± 0.750 ns/op
[info] StringDeserializerBenchmark.deserializeImproved 1 ascii avgt 25 51.074 ± 0.720 ns/op
[info] StringDeserializerBenchmark.deserializeJDK 1 ascii avgt 25 63.402 ± 1.631 ns/op
[info] StringSerializerBenchmark.serializeDefault 1 ascii avgt 25 31.595 ± 0.489 ns/op
[info] StringSerializerBenchmark.serializeImproved 1 ascii avgt 25 33.454 ± 0.151 ns/op
[info] StringSerializerBenchmark.serializeJDK 1 ascii avgt 25 34.721 ± 0.128 ns/op
[info] StringDeserializerBenchmark.deserializeDefault 16 ascii avgt 25 251.321 ± 3.251 ns/op
[info] StringDeserializerBenchmark.deserializeImproved 16 ascii avgt 25 55.385 ± 1.176 ns/op
[info] StringDeserializerBenchmark.deserializeJDK 16 ascii avgt 25 77.147 ± 1.661 ns/op
[info] StringSerializerBenchmark.serializeDefault 16 ascii avgt 25 95.782 ± 0.261 ns/op
[info] StringSerializerBenchmark.serializeImproved 16 ascii avgt 25 51.806 ± 0.180 ns/op
[info] StringSerializerBenchmark.serializeJDK 16 ascii avgt 25 50.786 ± 1.677 ns/op
[info] StringDeserializerBenchmark.deserializeDefault 128 ascii avgt 25 1757.726 ± 3.312 ns/op
[info] StringDeserializerBenchmark.deserializeImproved 128 ascii avgt 25 140.374 ± 1.006 ns/op
[info] StringDeserializerBenchmark.deserializeJDK 128 ascii avgt 25 263.445 ± 6.912 ns/op
[info] StringSerializerBenchmark.serializeDefault 128 ascii avgt 25 670.627 ± 2.807 ns/op
[info] StringSerializerBenchmark.serializeImproved 128 ascii avgt 25 161.481 ± 2.798 ns/op
[info] StringSerializerBenchmark.serializeJDK 128 ascii avgt 25 151.789 ± 7.295 ns/op
```
So for ascii strings:
* on 1-char strings the new implementation is a bit slower than the old one.
* on 16-char strings encoding is **2x** faster, decoding is **5x** faster
* on 128-char strings encoding is **4x** faster, decoding is **12x** faster
### Non-ascii strings
```
[info] Benchmark (length) (stringType) Mode Cnt Score Error Units
[info] StringDeserializerBenchmark.deserializeDefault 1 chinese avgt 25 77.743 ± 1.635 ns/op
[info] StringDeserializerBenchmark.deserializeImproved 1 chinese avgt 25 78.814 ± 1.329 ns/op
[info] StringDeserializerBenchmark.deserializeJDK 1 chinese avgt 25 66.005 ± 1.325 ns/op
[info] StringSerializerBenchmark.serializeDefault 1 chinese avgt 25 36.767 ± 3.662 ns/op
[info] StringSerializerBenchmark.serializeImproved 1 chinese avgt 25 36.382 ± 0.153 ns/op
[info] StringSerializerBenchmark.serializeJDK 1 chinese avgt 25 36.845 ± 0.575 ns/op
[info] StringDeserializerBenchmark.deserializeDefault 16 chinese avgt 25 669.156 ± 3.021 ns/op
[info] StringDeserializerBenchmark.deserializeImproved 16 chinese avgt 25 182.587 ± 4.843 ns/op
[info] StringDeserializerBenchmark.deserializeJDK 16 chinese avgt 25 148.063 ± 2.204 ns/op
[info] StringSerializerBenchmark.serializeDefault 16 chinese avgt 25 244.844 ± 1.079 ns/op
[info] StringSerializerBenchmark.serializeImproved 16 chinese avgt 25 86.651 ± 1.316 ns/op
[info] StringSerializerBenchmark.serializeJDK 16 chinese avgt 25 81.840 ± 1.976 ns/op
[info] StringDeserializerBenchmark.deserializeDefault 128 chinese avgt 25 5147.632 ± 30.068 ns/op
[info] StringDeserializerBenchmark.deserializeImproved 128 chinese avgt 25 714.912 ± 26.240 ns/op
[info] StringDeserializerBenchmark.deserializeJDK 128 chinese avgt 25 738.740 ± 7.291 ns/op
[info] StringSerializerBenchmark.serializeDefault 128 chinese avgt 25 1889.786 ± 8.541 ns/op
[info] StringSerializerBenchmark.serializeImproved 128 chinese avgt 25 388.404 ± 2.511 ns/op
[info] StringSerializerBenchmark.serializeJDK 128 chinese avgt 25 401.011 ± 3.157 ns/op
```
So for non-ascii chinese character strings:
* on 1-char strings there is no performance difference.
* on 16-char strings encoding is **3x** faster, decoding is **4x** faster
* on 128-char strings encoding is **5x** faster, decoding is **7x** faster
## Brief change log
- Replace an existing `StringValue.writeString` and `StringValue.readString` methods with improved ones.
- Add an additional test case to StringSerializationTest to cover utf8 encoding/decoding.
## Verifying this change
This change is already covered by existing tests, such as StringSerializationTest.
Also this change added additional test cases and can be verified as follows:
- Add an additional test case to StringSerializationTest to cover utf8 encoding/decoding.
## Does this pull request potentially affect one of the following parts:
- Dependencies (does it add or upgrade a dependency): (yes / **no**)
- The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes / **no**)
- The serializers: (**yes** / no / don't know)
- The runtime per-record code paths (performance sensitive): (**yes** / no / don't know)
- Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / **no** / don't know)
- The S3 file system connector: (yes / **no** / don't know)
## Documentation
- Does this pull request introduce a new feature? (yes / **no**)
- If yes, how is the feature documented? (**not applicable** / docs / JavaDocs / not documented)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services