You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by mrooding <ad...@webresource.nl> on 2017/10/05 08:39:03 UTC

Re: Savepoints and migrating value state data types

Gordon

Thanks for the detailed response. I have verified your assumption and that
is, unfortunately, the case.

I also looked into creating a custom Kryo serializer but I got stuck on
serializing arrays of complex types. It seems like this isn't trivial at all
with Kryo.

As an alternative, I've been looking into using Avro only for the Flink
buffers. Basically, as a first step, we'd still be sending JSON messages
through Kafka but we would use a custom TypeSerializer that converts the
case classes to bytes using Avro and vice versa. Unfortunately,
documentation is really scarce. 

In a different topic,
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Serialization-performance-td12019.html,
it says that Avro is a bit of an odd citizen and that the AvroSerializer
provided by Flink uses Kryo. This confirms what I've found by going through
the source code of Flink myself. 

I hope that you can provide me with some pointers. Is extending
TypeSerializer[T] the best way forward if we only want to use Avro for state
buffers and thus utilize Avro's schema migration facilities?

Any pointers would be greatly appreciated!

Kind regards

Marc



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Savepoints and migrating value state data types

Posted by Aljoscha Krettek <al...@apache.org>.
Actually, Flink 1.4 will come with improved Avro support. See especially:

 - https://issues.apache.org/jira/browse/FLINK-7420: <https://issues.apache.org/jira/browse/FLINK-7420:> Move All Avro Code to flink-avro
 - https://issues.apache.org/jira/browse/FLINK-7997: <https://issues.apache.org/jira/browse/FLINK-7997:> Avro should be always in the user code
 - https://issues.apache.org/jira/browse/FLINK-6022: <https://issues.apache.org/jira/browse/FLINK-6022:> Improve support for Avro GenericRecord

This makes AvroTypeInfo and AvroSerializer quite usable.

> On 6. Nov 2017, at 15:13, mrooding <ad...@webresource.nl> wrote:
> 
> Hi Gordon
> 
> I've been looking into creating a custom AvroSerializer without Kryo which
> would support Avro schemas and I'm starting to wonder if this is actually
> the most straightforward way to do it. 
> 
> If I extend a class from TypeSerializer I would also need to implement a
> TypeInformation class to be able to provide my serializer. Implementing all
> these classes seems to be quite the ordeal without proper documentation. Are
> you sure that this is the right way forward and that there's no other option
> of using Avro serialization with schema support for Flink?
> 
> Thanks again
> 
> Marc
> 
> 
> 
> 
> 
> --
> Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Re: Savepoints and migrating value state data types

Posted by mrooding <ad...@webresource.nl>.
Hi Gordon

I've been looking into creating a custom AvroSerializer without Kryo which
would support Avro schemas and I'm starting to wonder if this is actually
the most straightforward way to do it. 

If I extend a class from TypeSerializer I would also need to implement a
TypeInformation class to be able to provide my serializer. Implementing all
these classes seems to be quite the ordeal without proper documentation. Are
you sure that this is the right way forward and that there's no other option
of using Avro serialization with schema support for Flink?

Thanks again

Marc





--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Savepoints and migrating value state data types

Posted by "Tzu-Li (Gordon) Tai" <tz...@apache.org>.
Hi,

Yes, the AvroSerializer currently partially still uses Kryo for object copying.
Also, right now, I think the AvroSerializer is only used when the type is recognized as a POJO, and that `isForceAvroEnabled` is set on the job configuration. I’m not sure if that is always possible.
As mentioned in [1], we would probably need to improve the user experience for Avro usage.

For now, if you want to directly use Avro only for serializing your state, AFAIK the straightforward approach would be, as you mentioned, to extend a custom TypeSerializer that uses the Avro constructs.
Flink’s AvroSerializer actually already sorts of does this, so you can refer to that implementation as a base line.

Cheers,
Gordon

[1] http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Serialization-performance-td12019.html
On 5 October 2017 at 4:39:10 PM, mrooding (admin@webresource.nl) wrote:

Gordon  

Thanks for the detailed response. I have verified your assumption and that  
is, unfortunately, the case.  

I also looked into creating a custom Kryo serializer but I got stuck on  
serializing arrays of complex types. It seems like this isn't trivial at all  
with Kryo.  

As an alternative, I've been looking into using Avro only for the Flink  
buffers. Basically, as a first step, we'd still be sending JSON messages  
through Kafka but we would use a custom TypeSerializer that converts the  
case classes to bytes using Avro and vice versa. Unfortunately,  
documentation is really scarce.  

In a different topic,  
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Serialization-performance-td12019.html,  
it says that Avro is a bit of an odd citizen and that the AvroSerializer  
provided by Flink uses Kryo. This confirms what I've found by going through  
the source code of Flink myself.  

I hope that you can provide me with some pointers. Is extending  
TypeSerializer[T] the best way forward if we only want to use Avro for state  
buffers and thus utilize Avro's schema migration facilities?  

Any pointers would be greatly appreciated!  

Kind regards  

Marc  



--  
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/