You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Andrea Sella <me...@alkagin.xyz> on 2018/11/16 12:10:18 UTC

Flink Serialization and case class fields limit

Hey squirrels,

I've started to study more in-depth Flink Serialization and its "type
system".

I have a generated case class using scalapb that has more than 30 fields;
I've seen that Flink still uses the CaseClassSerializer, the
TypeInformation is CaseClassTypeInfo, even if in the docs[1] is written
differently (22 fields limit). I'd have expected a GenericTypeInfo, but all
is well because the CaseClassSerializer is faster than Kryo. Did I
misunderstand the documentation or don't the limitation apply anymore?

Another thing, I've read "Juggling with Bits and Bytes"[2] blog post an I
would like to replicate the experiment with some tailored changes to deep
dive even better in the topic. Is the source code in Github or somewhere
else?

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/types_serialization.html#flinks-typeinformation-class
[2]
https://flink.apache.org/news/2015/05/11/Juggling-with-Bits-and-Bytes.html

Thank you,
Andrea

Re: Flink Serialization and case class fields limit

Posted by Fabian Hueske <fh...@gmail.com>.
Hi Andrea,

I wrote the post 2.5 years ago. Sorry, I don't think that I kept the code
somewhere, but the mechanics in Flink should still be the same.

Best, Fabian

Am Fr., 16. Nov. 2018, 20:06 hat Andrea Sella <me...@alkagin.xyz> geschrieben:

> Hi Andrey,
>
> My bad, I forgot to say that I am using Scala 2.11, that’s why I asked
> about the limitation, and Flink 1.5.5.
>
> If I recall correctly CaseClassSerilizer and CaseClassTypeInfo don’t rely
> on unapply and tupled functions, so I'd say that Flink doesn't have this
> kind of limitation with Scala 2.11. Correct?
>
> Thank you,
> Andrea
> On Fri, 16 Nov 2018 at 19:34, Andrey Zagrebin <an...@data-artisans.com>
> wrote:
>
>> Hi Andrea,
>>
>> 22 limit comes from Scala [1], not Flink.
>> I am not sure about any repo for the post, but I also cc'ed Fabian, maybe
>> he will point to some if it exists.
>>
>> Best,
>> Andrey
>>
>> [1] https://underscore.io/blog/posts/2016/10/11/twenty-two.html
>>
>>
>> On 16 Nov 2018, at 13:10, Andrea Sella <me...@alkagin.xyz> wrote:
>>
>> Hey squirrels,
>>
>> I've started to study more in-depth Flink Serialization and its "type
>> system".
>>
>> I have a generated case class using scalapb that has more than 30 fields;
>> I've seen that Flink still uses the CaseClassSerializer, the
>> TypeInformation is CaseClassTypeInfo, even if in the docs[1] is written
>> differently (22 fields limit). I'd have expected a GenericTypeInfo, but all
>> is well because the CaseClassSerializer is faster than Kryo. Did I
>> misunderstand the documentation or don't the limitation apply anymore?
>>
>> Another thing, I've read "Juggling with Bits and Bytes"[2] blog post an I
>> would like to replicate the experiment with some tailored changes to deep
>> dive even better in the topic. Is the source code in Github or somewhere
>> else?
>>
>> [1]
>> https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/types_serialization.html#flinks-typeinformation-class
>> [2]
>> https://flink.apache.org/news/2015/05/11/Juggling-with-Bits-and-Bytes.html
>>
>> Thank you,
>> Andrea
>>
>>
>>

Re: Flink Serialization and case class fields limit

Posted by Andrea Sella <me...@alkagin.xyz>.
Hi Andrey,

My bad, I forgot to say that I am using Scala 2.11, that’s why I asked
about the limitation, and Flink 1.5.5.

If I recall correctly CaseClassSerilizer and CaseClassTypeInfo don’t rely
on unapply and tupled functions, so I'd say that Flink doesn't have this
kind of limitation with Scala 2.11. Correct?

Thank you,
Andrea
On Fri, 16 Nov 2018 at 19:34, Andrey Zagrebin <an...@data-artisans.com>
wrote:

> Hi Andrea,
>
> 22 limit comes from Scala [1], not Flink.
> I am not sure about any repo for the post, but I also cc'ed Fabian, maybe
> he will point to some if it exists.
>
> Best,
> Andrey
>
> [1] https://underscore.io/blog/posts/2016/10/11/twenty-two.html
>
>
> On 16 Nov 2018, at 13:10, Andrea Sella <me...@alkagin.xyz> wrote:
>
> Hey squirrels,
>
> I've started to study more in-depth Flink Serialization and its "type
> system".
>
> I have a generated case class using scalapb that has more than 30 fields;
> I've seen that Flink still uses the CaseClassSerializer, the
> TypeInformation is CaseClassTypeInfo, even if in the docs[1] is written
> differently (22 fields limit). I'd have expected a GenericTypeInfo, but all
> is well because the CaseClassSerializer is faster than Kryo. Did I
> misunderstand the documentation or don't the limitation apply anymore?
>
> Another thing, I've read "Juggling with Bits and Bytes"[2] blog post an I
> would like to replicate the experiment with some tailored changes to deep
> dive even better in the topic. Is the source code in Github or somewhere
> else?
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/types_serialization.html#flinks-typeinformation-class
> [2]
> https://flink.apache.org/news/2015/05/11/Juggling-with-Bits-and-Bytes.html
>
> Thank you,
> Andrea
>
>
>

Re: Flink Serialization and case class fields limit

Posted by Andrey Zagrebin <an...@data-artisans.com>.
Hi Andrea,

22 limit comes from Scala [1], not Flink.
I am not sure about any repo for the post, but I also cc'ed Fabian, maybe he will point to some if it exists.

Best,
Andrey

[1] https://underscore.io/blog/posts/2016/10/11/twenty-two.html

> On 16 Nov 2018, at 13:10, Andrea Sella <me...@alkagin.xyz> wrote:
> 
> Hey squirrels,
> 
> I've started to study more in-depth Flink Serialization and its "type system".
> 
> I have a generated case class using scalapb that has more than 30 fields; I've seen that Flink still uses the CaseClassSerializer, the TypeInformation is CaseClassTypeInfo, even if in the docs[1] is written differently (22 fields limit). I'd have expected a GenericTypeInfo, but all is well because the CaseClassSerializer is faster than Kryo. Did I misunderstand the documentation or don't the limitation apply anymore? 
> 
> Another thing, I've read "Juggling with Bits and Bytes"[2] blog post an I would like to replicate the experiment with some tailored changes to deep dive even better in the topic. Is the source code in Github or somewhere else?
> 
> [1] https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/types_serialization.html#flinks-typeinformation-class <https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/types_serialization.html#flinks-typeinformation-class>
> [2] https://flink.apache.org/news/2015/05/11/Juggling-with-Bits-and-Bytes.html <https://flink.apache.org/news/2015/05/11/Juggling-with-Bits-and-Bytes.html>
> 
> Thank you,
> Andrea
>