You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Andrea Spina <an...@radicalbit.io> on 2019/07/04 06:37:19 UTC

Providing Custom Serializer for Generic Type

Dear community,
in my job, I run with a custom event type *MyClass* which is a sort of
"generic event" that I handle all along my streaming flow both as an event
(DataStream[MyClass]) and as a managed state.

I see that Flink warns me about generic serialization of
*MyClass*
 INFO [run-main-0] (TypeExtractor.java:1818) - class io.radicalbit.MyClass
does not contain a setter for field io$radicalbit$MyClass$$schema
 INFO [run-main-0] (TypeExtractor.java:1857) - Class class
io.radicalbit.MyClass cannot be used as a POJO type because not all fields
are valid POJO fields, and must be processed as GenericType. Please read
the Flink documentation on "Data Types & Serialization" for details of the
effect on performance.
 INFO [run-main-0] (TypeExtractor.java:1818) - class io.radicalbit.MyClass
does not contain a setter for field io$radicalbit$MyClass$schema

So that I wanted to provide my custom serializer for MyClass, trying first
to register the Java one to check if the system recognizes it so I followed
[1] but it seems that it is not considered.

I read then about [2] (the case is way akin to mine) and AFAIU I need to
implement a custom TypeInformation and TypeSerializer for my class as
suggested in [3] because Flink will ignore my registered serializer as long
as it considers my type as *generic*.

config.registerTypeWithKryoSerializer(classOf[MyClass], classOf[RadicalSerde])


My question finally is: Do I need to provide this custom classes? Is there
any practical example for creating custom information like the above
mentioned? I have had a quick preliminary look at it but seems that I need
to provide a non-trivial amount of information to TypeInformation and
TypeSerializer interfaces.

Thank you for your excellent work and help.

Cheers.

[1] -
https://ci.apache.org/projects/flink/flink-docs-stable/dev/custom_serializers.html
[2] -
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Custom-Serializer-for-Avro-GenericRecord-td25433.html
[3] -
https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/types_serialization.html#defining-type-information-using-a-factory
-- 
*Andrea Spina*
Head of R&D @ Radicalbit Srl
Via Giovanni Battista Pirelli 11, 20124, Milano - IT

Re: Providing Custom Serializer for Generic Type

Posted by Andrea Spina <an...@radicalbit.io>.
Hi Gordon, thank you.
The involved data structure is a complex abstraction owning a schema and
values, it declares private fields which should not be edited directly from
users. I'd say it's really akin to an Avro GenericRecord. How would you
approach the problem if you have to serialize/deserialize efficiently an
Avro GenericRecord? I think it cannot be a POJO and ser/de using avro
brings so much overhead described also at [1].

Thank you really much for your help.

Andrea

[1] -
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Custom-Serializer-for-Avro-GenericRecord-td25433.html

Il giorno gio 4 lug 2019 alle ore 11:23 Tzu-Li (Gordon) Tai <
tzulitai@apache.org> ha scritto:

> Hi Andrea,
>
> Is there a specific reason you want to use a custom TypeInformation /
> TypeSerializer for your type?
> From the description in the original post, this part wasn't clear to me.
>
> If the only reason is because it is generally suggested to avoid generic
> type serialization via Kryo, both for performance reasons as well as
> evolvability in the future, then updating your type to be recognized by
> Flink as one of the supported types [1] would be enough.
> Otherwise, implementing your own type information and serializer is
> usually only something users with very specific use cases might be required
> to do.
> Since you are also using that type as managed state, for a safer schema
> evolvability story in the future, I would recommend either Avro or Pojo as
> Jingsong Lee had already mentioned.
>
> Cheers,
> Gordon
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/types_serialization.html#flinks-typeinformation-class
>
> On Thu, Jul 4, 2019 at 5:08 PM Andrea Spina <an...@radicalbit.io>
> wrote:
>
>> Hi JingsongLee, thank you for your answer.
>> I wanted to explore it as the last chance honestly. Anyway if defining
>> custom serializers and types information involves quite a big effort, I
>> would reconsider my guess.
>>
>> Cheers,
>>
>> Il giorno gio 4 lug 2019 alle ore 08:46 JingsongLee <
>> lzljs3620320@aliyun.com> ha scritto:
>>
>>> Hi Andrea:
>>> Why not make your *MyClass* POJO? [1] If it is a POJO, then flink
>>> will use PojoTypeInfo and PojoSerializer that have a good
>>> implementation already.
>>>
>>> [1]
>>> https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/types_serialization.html#rules-for-pojo-types
>>>
>>> Best, JingsongLee
>>>
>>> ------------------------------------------------------------------
>>> From:Andrea Spina <an...@radicalbit.io>
>>> Send Time:2019年7月4日(星期四) 14:37
>>> To:user <us...@flink.apache.org>
>>> Subject:Providing Custom Serializer for Generic Type
>>>
>>> Dear community,
>>> in my job, I run with a custom event type *MyClass* which is a sort of
>>> "generic event" that I handle all along my streaming flow both as an event
>>> (DataStream[MyClass]) and as a managed state.
>>>
>>> I see that Flink warns me about generic serialization of
>>> *MyClass*
>>>  INFO [run-main-0] (TypeExtractor.java:1818) - class
>>> io.radicalbit.MyClass does not contain a setter for field
>>> io$radicalbit$MyClass$$schema
>>>  INFO [run-main-0] (TypeExtractor.java:1857) - Class class
>>> io.radicalbit.MyClass cannot be used as a POJO type because not all fields
>>> are valid POJO fields, and must be processed as GenericType. Please read
>>> the Flink documentation on "Data Types & Serialization" for details of the
>>> effect on performance.
>>>  INFO [run-main-0] (TypeExtractor.java:1818) - class
>>> io.radicalbit.MyClass does not contain a setter for field
>>> io$radicalbit$MyClass$schema
>>>
>>> So that I wanted to provide my custom serializer for MyClass, trying
>>> first to register the Java one to check if the system recognizes it so I
>>> followed [1] but it seems that it is not considered.
>>>
>>> I read then about [2] (the case is way akin to mine) and AFAIU I need to
>>> implement a custom TypeInformation and TypeSerializer for my class as
>>> suggested in [3] because Flink will ignore my registered serializer as long
>>> as it considers my type as *generic*.
>>>
>>> config.registerTypeWithKryoSerializer(classOf[MyClass], classOf[RadicalSerde])
>>>
>>>
>>> My question finally is: Do I need to provide this custom classes? Is
>>> there any practical example for creating custom information like the above
>>> mentioned? I have had a quick preliminary look at it but seems that I need
>>> to provide a non-trivial amount of information to TypeInformation and
>>> TypeSerializer interfaces.
>>>
>>> Thank you for your excellent work and help.
>>>
>>> Cheers.
>>>
>>> [1] -
>>> https://ci.apache.org/projects/flink/flink-docs-stable/dev/custom_serializers.html
>>> [2] -
>>> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Custom-Serializer-for-Avro-GenericRecord-td25433.html
>>> [3] -
>>> https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/types_serialization.html#defining-type-information-using-a-factory
>>> --
>>> Andrea Spina
>>> Head of R&D @ Radicalbit Srl
>>> Via Giovanni Battista Pirelli 11, 20124, Milano - IT
>>>
>>>
>>>
>>
>> --
>> *Andrea Spina*
>> Head of R&D @ Radicalbit Srl
>> Via Giovanni Battista Pirelli 11, 20124, Milano - IT
>>
>

-- 
*Andrea Spina*
Head of R&D @ Radicalbit Srl
Via Giovanni Battista Pirelli 11, 20124, Milano - IT

Re: Providing Custom Serializer for Generic Type

Posted by "Tzu-Li (Gordon) Tai" <tz...@apache.org>.
Hi Andrea,

Is there a specific reason you want to use a custom TypeInformation /
TypeSerializer for your type?
From the description in the original post, this part wasn't clear to me.

If the only reason is because it is generally suggested to avoid generic
type serialization via Kryo, both for performance reasons as well as
evolvability in the future, then updating your type to be recognized by
Flink as one of the supported types [1] would be enough.
Otherwise, implementing your own type information and serializer is usually
only something users with very specific use cases might be required to do.
Since you are also using that type as managed state, for a safer schema
evolvability story in the future, I would recommend either Avro or Pojo as
Jingsong Lee had already mentioned.

Cheers,
Gordon

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/types_serialization.html#flinks-typeinformation-class

On Thu, Jul 4, 2019 at 5:08 PM Andrea Spina <an...@radicalbit.io>
wrote:

> Hi JingsongLee, thank you for your answer.
> I wanted to explore it as the last chance honestly. Anyway if defining
> custom serializers and types information involves quite a big effort, I
> would reconsider my guess.
>
> Cheers,
>
> Il giorno gio 4 lug 2019 alle ore 08:46 JingsongLee <
> lzljs3620320@aliyun.com> ha scritto:
>
>> Hi Andrea:
>> Why not make your *MyClass* POJO? [1] If it is a POJO, then flink
>> will use PojoTypeInfo and PojoSerializer that have a good
>> implementation already.
>>
>> [1]
>> https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/types_serialization.html#rules-for-pojo-types
>>
>> Best, JingsongLee
>>
>> ------------------------------------------------------------------
>> From:Andrea Spina <an...@radicalbit.io>
>> Send Time:2019年7月4日(星期四) 14:37
>> To:user <us...@flink.apache.org>
>> Subject:Providing Custom Serializer for Generic Type
>>
>> Dear community,
>> in my job, I run with a custom event type *MyClass* which is a sort of
>> "generic event" that I handle all along my streaming flow both as an event
>> (DataStream[MyClass]) and as a managed state.
>>
>> I see that Flink warns me about generic serialization of
>> *MyClass*
>>  INFO [run-main-0] (TypeExtractor.java:1818) - class
>> io.radicalbit.MyClass does not contain a setter for field
>> io$radicalbit$MyClass$$schema
>>  INFO [run-main-0] (TypeExtractor.java:1857) - Class class
>> io.radicalbit.MyClass cannot be used as a POJO type because not all fields
>> are valid POJO fields, and must be processed as GenericType. Please read
>> the Flink documentation on "Data Types & Serialization" for details of the
>> effect on performance.
>>  INFO [run-main-0] (TypeExtractor.java:1818) - class
>> io.radicalbit.MyClass does not contain a setter for field
>> io$radicalbit$MyClass$schema
>>
>> So that I wanted to provide my custom serializer for MyClass, trying
>> first to register the Java one to check if the system recognizes it so I
>> followed [1] but it seems that it is not considered.
>>
>> I read then about [2] (the case is way akin to mine) and AFAIU I need to
>> implement a custom TypeInformation and TypeSerializer for my class as
>> suggested in [3] because Flink will ignore my registered serializer as long
>> as it considers my type as *generic*.
>>
>> config.registerTypeWithKryoSerializer(classOf[MyClass], classOf[RadicalSerde])
>>
>>
>> My question finally is: Do I need to provide this custom classes? Is
>> there any practical example for creating custom information like the above
>> mentioned? I have had a quick preliminary look at it but seems that I need
>> to provide a non-trivial amount of information to TypeInformation and
>> TypeSerializer interfaces.
>>
>> Thank you for your excellent work and help.
>>
>> Cheers.
>>
>> [1] -
>> https://ci.apache.org/projects/flink/flink-docs-stable/dev/custom_serializers.html
>> [2] -
>> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Custom-Serializer-for-Avro-GenericRecord-td25433.html
>> [3] -
>> https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/types_serialization.html#defining-type-information-using-a-factory
>> --
>> Andrea Spina
>> Head of R&D @ Radicalbit Srl
>> Via Giovanni Battista Pirelli 11, 20124, Milano - IT
>>
>>
>>
>
> --
> *Andrea Spina*
> Head of R&D @ Radicalbit Srl
> Via Giovanni Battista Pirelli 11, 20124, Milano - IT
>

Re: Providing Custom Serializer for Generic Type

Posted by Andrea Spina <an...@radicalbit.io>.
Hi JingsongLee, thank you for your answer.
I wanted to explore it as the last chance honestly. Anyway if defining
custom serializers and types information involves quite a big effort, I
would reconsider my guess.

Cheers,

Il giorno gio 4 lug 2019 alle ore 08:46 JingsongLee <lz...@aliyun.com>
ha scritto:

> Hi Andrea:
> Why not make your *MyClass* POJO? [1] If it is a POJO, then flink
> will use PojoTypeInfo and PojoSerializer that have a good
> implementation already.
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/types_serialization.html#rules-for-pojo-types
>
> Best, JingsongLee
>
> ------------------------------------------------------------------
> From:Andrea Spina <an...@radicalbit.io>
> Send Time:2019年7月4日(星期四) 14:37
> To:user <us...@flink.apache.org>
> Subject:Providing Custom Serializer for Generic Type
>
> Dear community,
> in my job, I run with a custom event type *MyClass* which is a sort of
> "generic event" that I handle all along my streaming flow both as an event
> (DataStream[MyClass]) and as a managed state.
>
> I see that Flink warns me about generic serialization of
> *MyClass*
>  INFO [run-main-0] (TypeExtractor.java:1818) - class io.radicalbit.MyClass
> does not contain a setter for field io$radicalbit$MyClass$$schema
>  INFO [run-main-0] (TypeExtractor.java:1857) - Class class
> io.radicalbit.MyClass cannot be used as a POJO type because not all fields
> are valid POJO fields, and must be processed as GenericType. Please read
> the Flink documentation on "Data Types & Serialization" for details of the
> effect on performance.
>  INFO [run-main-0] (TypeExtractor.java:1818) - class io.radicalbit.MyClass
> does not contain a setter for field io$radicalbit$MyClass$schema
>
> So that I wanted to provide my custom serializer for MyClass, trying first
> to register the Java one to check if the system recognizes it so I followed
> [1] but it seems that it is not considered.
>
> I read then about [2] (the case is way akin to mine) and AFAIU I need to
> implement a custom TypeInformation and TypeSerializer for my class as
> suggested in [3] because Flink will ignore my registered serializer as long
> as it considers my type as *generic*.
>
> config.registerTypeWithKryoSerializer(classOf[MyClass], classOf[RadicalSerde])
>
>
> My question finally is: Do I need to provide this custom classes? Is there
> any practical example for creating custom information like the above
> mentioned? I have had a quick preliminary look at it but seems that I need
> to provide a non-trivial amount of information to TypeInformation and
> TypeSerializer interfaces.
>
> Thank you for your excellent work and help.
>
> Cheers.
>
> [1] -
> https://ci.apache.org/projects/flink/flink-docs-stable/dev/custom_serializers.html
> [2] -
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Custom-Serializer-for-Avro-GenericRecord-td25433.html
> [3] -
> https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/types_serialization.html#defining-type-information-using-a-factory
> --
> Andrea Spina
> Head of R&D @ Radicalbit Srl
> Via Giovanni Battista Pirelli 11, 20124, Milano - IT
>
>
>

-- 
*Andrea Spina*
Head of R&D @ Radicalbit Srl
Via Giovanni Battista Pirelli 11, 20124, Milano - IT

Re: Providing Custom Serializer for Generic Type

Posted by JingsongLee <lz...@aliyun.com>.
Hi Andrea:
Why not make your MyClass POJO? [1] If it is a POJO, then flink 
will use PojoTypeInfo and PojoSerializer that have a good 
implementation already.

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/types_serialization.html#rules-for-pojo-types

Best, JingsongLee


------------------------------------------------------------------
From:Andrea Spina <an...@radicalbit.io>
Send Time:2019年7月4日(星期四) 14:37
To:user <us...@flink.apache.org>
Subject:Providing Custom Serializer for Generic Type

Dear community,
in my job, I run with a custom event type MyClass which is a sort of "generic event" that I handle all along my streaming flow both as an event (DataStream[MyClass]) and as a managed state.

I see that Flink warns me about generic serialization of MyClass

 INFO [run-main-0] (TypeExtractor.java:1818) - class io.radicalbit.MyClass does not contain a setter for field io$radicalbit$MyClass$$schema
 INFO [run-main-0] (TypeExtractor.java:1857) - Class class io.radicalbit.MyClass cannot be used as a POJO type because not all fields are valid POJO fields, and must be processed as GenericType. Please read the Flink documentation on "Data Types & Serialization" for details of the effect on performance.
 INFO [run-main-0] (TypeExtractor.java:1818) - class io.radicalbit.MyClass does not contain a setter for field io$radicalbit$MyClass$schema

So that I wanted to provide my custom serializer for MyClass, trying first to register the Java one to check if the system recognizes it so I followed [1] but it seems that it is not considered.

I read then about [2] (the case is way akin to mine) and AFAIU I need to implement a custom TypeInformation and TypeSerializer for my class as suggested in [3] because Flink will ignore my registered serializer as long as it considers my type as generic.

config.registerTypeWithKryoSerializer(classOf[MyClass], classOf[RadicalSerde])
My question finally is: Do I need to provide this custom classes? Is there any practical example for creating custom information like the above mentioned? I have had a quick preliminary look at it but seems that I need to provide a non-trivial amount of information to TypeInformation and TypeSerializer interfaces.

Thank you for your excellent work and help.

Cheers. 

[1] - https://ci.apache.org/projects/flink/flink-docs-stable/dev/custom_serializers.html
[2] - http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Custom-Serializer-for-Avro-GenericRecord-td25433.html
[3] - https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/types_serialization.html#defining-type-information-using-a-factory
-- 
Andrea Spina
Head of R&D @ Radicalbit Srl 
Via Giovanni Battista Pirelli 11, 20124, Milano - IT