You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Roman Grebennikov (Jira)" <ji...@apache.org> on 2019/11/29 11:37:00 UTC
[jira] [Commented] (FLINK-14346) Performance issue with StringSerializer

    [ https://issues.apache.org/jira/browse/FLINK-14346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16984917#comment-16984917 ] 

Roman Grebennikov commented on FLINK-14346:
-------------------------------------------

I've made a proper PR to port these performance optimizations from the PoC to Flink codebase. So it would be really nice if someone will review them eventually.

The motivation behind this change is listed in the PR description, but the most important benchmark results looking quite good:
{noformat}
[info]	Benchmark	                      (length)	(stringType)	Mode	Cnt	Score	    Error	Units
[info]	StringDeserializerBenchmark.deserializeDefault	1	ascii	avgt	25	46.603	±	0.750	ns/op
[info]	StringDeserializerBenchmark.deserializeImproved	1	ascii	avgt	25	51.074	±	0.720	ns/op
[info]	StringDeserializerBenchmark.deserializeJDK	1	ascii	avgt	25	63.402	±	1.631	ns/op
[info]	StringSerializerBenchmark.serializeDefault	1	ascii	avgt	25	31.595	±	0.489	ns/op
[info]	StringSerializerBenchmark.serializeImproved	1	ascii	avgt	25	33.454	±	0.151	ns/op
[info]	StringSerializerBenchmark.serializeJDK		1	ascii	avgt	25	34.721	±	0.128	ns/op

[info]	StringDeserializerBenchmark.deserializeDefault	16	ascii	avgt	25	251.321	±	3.251	ns/op
[info]	StringDeserializerBenchmark.deserializeImproved	16	ascii	avgt	25	55.385	±	1.176	ns/op
[info]	StringDeserializerBenchmark.deserializeJDK	16	ascii	avgt	25	77.147	±	1.661	ns/op
[info]	StringSerializerBenchmark.serializeDefault	16	ascii	avgt	25	95.782	±	0.261	ns/op
[info]	StringSerializerBenchmark.serializeImproved	16	ascii	avgt	25	51.806	±	0.180	ns/op
[info]	StringSerializerBenchmark.serializeJDK		16	ascii	avgt	25	50.786	±	1.677	ns/op

[info]	StringDeserializerBenchmark.deserializeDefault	128	ascii	avgt	25	1757.726 ±	3.312	ns/op
[info]	StringDeserializerBenchmark.deserializeImproved	128	ascii	avgt	25	140.374	±	1.006	ns/op
[info]	StringDeserializerBenchmark.deserializeJDK	128	ascii	avgt	25	263.445	±	6.912	ns/op
[info]	StringSerializerBenchmark.serializeDefault	128	ascii	avgt	25	670.627	±	2.807	ns/op
[info]	StringSerializerBenchmark.serializeImproved	128	ascii	avgt	25	161.481	±	2.798	ns/op
[info]	StringSerializerBenchmark.serializeJDK		128	ascii	avgt	25	151.789	±	7.295	ns/op
{noformat}

> Performance issue with StringSerializer
> ---------------------------------------
>
>                 Key: FLINK-14346
>                 URL: https://issues.apache.org/jira/browse/FLINK-14346
>             Project: Flink
>          Issue Type: Improvement
>          Components: API / Type Serialization System, Benchmarks
>    Affects Versions: 1.9.0
>         Environment: Tested on Flink 1.9.0, adoptopenjdk 8u222.
>            Reporter: Roman Grebennikov
>            Priority: Major
>              Labels: performance, pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> While doing a performance profiling for our Flink state-heavy streaming job, we found that quite  a significant amount of CPU time is spent inside StringSerializer writing data to the underlying byte buffer. The hottest part of the code is the StringValue.writeString function. And replacing the default StringSerializer with the custom one (to just play with a baseline), which is just calling DataOutput.writeUTF/readUTF surprisingly yielded to almost 2x speedup for string serialization.
> As writeUTF and writeString have incompatible wire formats, replacing latter with former is not a good idea in general as it may break checkpoint/savepoint compatibility.
> We also did an early performance analysis of the root cause of this performance issue, and the main reason of JDK's writeUTF being faster is that it's code is not writing directly to output stream byte-by-byte, but instead creating an underlying temporary byte buffer. This yields to a HotSpot almost perfectly unrolling the main loop, which results in much better data parallelism.
> I've tried to port the ideas from the JVM's implementation of writeUTF back to StringValue.writeString, and my current result is nice, having quite significant speedup compared to the current implementation:
> {{[info] Benchmark Mode Cnt Score Error Units}}
> {{[info] StringSerializerBenchmark.measureJDK avgt 30 82.871 ± 1.293 ns/op}}
> {{[info] StringSerializerBenchmark.measureNew avgt 30 94.004 ± 1.491 ns/op}}
> {{[info] StringSerializerBenchmark.measureOld avgt 30 156.905 ± 3.596 ns/op}}
>  
> {{Where measureJDK is the JDK's writeUTF asa baseline, measureOld is the current upstream implementation in Flink, and the measureNew is the improved one. }}
>  
> {{The code for the benchmark (and the improved version of the serializer) is here: [https://github.com/shuttie/flink-string-serializer]}}
>  
> {{Next steps:}}
>  # {{More benchmarks for non-ascii strings.}}
>  # {{Benchmarks for long strings.}}
>  # {{Benchmarks for deserialization.}}
>  # {{Tests for old-new wire format compatibility.}}
>  # {{PR to the Flink codebase.}}
> {{Is there an interest for this kind of performance improvement?}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)