You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by John Tipper <jo...@hotmail.com> on 2019/06/03 23:07:11 UTC

Is it possible to configure Flink pre-flight type serialization scanning?

Flink performs significant scanning during the pre-flight phase of a Flink application (https://ci.apache.org/projects/flink/flink-docs-stable/dev/types_serialization.html). The act of creating sources, operators and sinks causes Flink to scan the data types of the objects that are used within the topology of a given streaming flow as apparently Flink will try to optimise jobs based on this information.

Is this scanning configurable? Can I turn it off and just force Flink to use Kryo serialisation only and not need or use any of this scanned information?

I have a very large, deeply nested class in a proprietary library that was auto generated and Flink seems to get into a very large endless loop when scanning it that results in out of memory errors after running for several hours (the application never actually launches via env.execute(), even if I bump up the heap size significantly). The class has a number of circular references, i.e. class and its child classes contains references to other classes of the same type, is this likely to be a problem?

Many thanks,

John