You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Nick Nezis (Jira)" <ji...@apache.org> on 2021/01/06 19:15:00 UTC

[jira] [Comment Edited] (SPARK-34023) Updated to Kryo 5.0.3

    [ https://issues.apache.org/jira/browse/SPARK-34023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17259996#comment-17259996 ] 

Nick Nezis edited comment on SPARK-34023 at 1/6/21, 7:14 PM:
-------------------------------------------------------------

Good question. And perhaps the answer is that it's not worth doing the upgrade. I'll try to do my best to gather answers for these questions. 

{quote}
Some of the major changes which were documented [in Google Groups.|https://groups.google.com/g/kryo-users/c/sBZ10dwrwFQ] This write up is from 2018, so not sure if any of the information is outdated:
 * Generics has been redone to support all scenarios. [GenericsUtil|https://github.com/EsotericSoftware/kryo/blob/kryo-5.0.0-dev/src/com/esotericsoftware/kryo/util/GenericsUtil.java] thoroughly discovers the generic types that are known at compile time ([tests|https://github.com/EsotericSoftware/kryo/blob/kryo-5.0.0-dev/test/com/esotericsoftware/kryo/util/GenericsUtilTest.java]), which avoids as much work as possible during serialization. The generic types known only at serialization time are tracked by Kryo's instance of [Generics|https://github.com/EsotericSoftware/kryo/blob/kryo-5.0.0-dev/src/com/esotericsoftware/kryo/util/Generics.java], which maintains a stack of [GenericType|https://github.com/EsotericSoftware/kryo/blob/kryo-5.0.0-dev/src/com/esotericsoftware/kryo/util/Generics.java#L221-L222] instances (lots of nice javadocs in those classes).

How does using the new generics API look? For a simple example with just one type parameter, see [CollectionSerializer|https://github.com/EsotericSoftware/kryo/blob/kryo-5.0.0-dev/src/com/esotericsoftware/kryo/serializers/CollectionSerializer.java#L80-L97]. To get the class, call kryo.getGenerics().nextGenericClass() then after reading the child objects call kryo.getGenerics().popGenericType(). Simple!

nextGenericClass() is a shortcut for the common case of a class with a single type parameter. When there are multiple type parameters, like [Map|https://github.com/EsotericSoftware/kryo/blob/kryo-5.0.0-dev/src/com/esotericsoftware/kryo/serializers/MapSerializer.java#L79-L89], use nextGenericTypes() instead, then call resolve() to get each class. Also, the last parameter is made current automatically (meaning it is pushed by Generics#nextGenericTypes()), so the other parameter(s) have to be [pushed and popped manually|https://github.com/EsotericSoftware/kryo/blob/kryo-5.0.0-dev/src/com/esotericsoftware/kryo/serializers/MapSerializer.java#L93-L101]. Somewhat less simple!

It's a little tricky, but these two patterns are all serializers need to worry about. It provides full support for nested generic types, eg HashMap<ArrayList<Integer>, ArrayList<String>>, which wasn't possible before.

Feedback on all this is welcome, but the core of it is relatively complex and may induce a headache (it certainly did when writing it!). The important bits are 1) to handle all generics scenarios, and 2) to minimize work at serialization time. Digging through just the calls for generics made from serializers, you'll find minimal work is done and without allocations. Given this, generics are always enabled.


 * Serializer method signatures have changed for [issue #146|https://github.com/EsotericSoftware/kryo/pull/146] in [this commit|https://github.com/EsotericSoftware/kryo/commit/d22a23dccd9b38461950084188f69597f29c4a9a]. All Serializer classes need to be changed from Class<T> to Class<? extends T>.


 * Various serialization improvements. Eg, TaggedFieldSerializer now has acceptsNull=true, which saves 1 byte for all non-null objects.


 * Unsafe had permeated the API. IMO it should be sandboxed as much as possible. I began refactoring by removing Unsafe support completely, made everything nice, then put back the input/output streams. I haven't yet looked into what serializers would need to make use of all of the Unsafe features. With Java 9 dropping Unsafe, maybe it is not worth the considerable effort to support it. FWIW, I personally don't use Unsafe but I know others do. I assume some even choosing Kryo specifically for Unsafe. I don't know how those people feel about Java 9.

I have not looked at Java 9 at all yet. I don't know if it provides APIs that Kryo can use to be more efficient.

 * FieldSerializer had gotten messy internally. It did a lot more work than necessary to build the cached fields. It did things with generics that were suspect and did expensive computations that were not even used. It had a lot of Unsafe logic.

 * Various API improvements. The Kryo class must not become a junk drawer for serializer settings. The SerializerFactory classes can be used, eg [FieldSerializerFactory|https://github.com/EsotericSoftware/kryo/blob/kryo-5.0.0-dev/src/com/esotericsoftware/kryo/SerializerFactory.java#L96-L98], which can create new, configured serializers ([example|http://n4te.com/x/367-0yar.txt]). Having Input/Output varint and varlong be a hint was odd. If this ends up being needed again, maybe depending on what happens with Unsafe, it could be done in Input/Output subclasses without affecting the base class API. Javadocs are much improved.

 * Logging is improved. Some useless junk was logged, some important junk was not logged, and the formatting of log messages was not consistent.

 * Deserialization still [temporarily modifies|https://github.com/EsotericSoftware/kryo/blob/kryo-5.0.0-dev/src/com/esotericsoftware/kryo/io/Input.java#L604-L606] the input buffer. While in many cases this is fine, it can be quite an unexpected gotcha in some cases. I'd like to remedy this but string writing is important and any changes here need extensive benchmarks.
{quote}


was (Author: nicknezis):
Good question. And perhaps the answer is that it's not worth doing the upgrade. I'll try to do my best to gather answers for these questions. 

Some of the major changes which were documented [in Google Groups.|https://groups.google.com/g/kryo-users/c/sBZ10dwrwFQ] This write up is from 2018, so not sure if any of the information is outdated:
 * Generics has been redone to support all scenarios. [GenericsUtil|https://github.com/EsotericSoftware/kryo/blob/kryo-5.0.0-dev/src/com/esotericsoftware/kryo/util/GenericsUtil.java] thoroughly discovers the generic types that are known at compile time ([tests|https://github.com/EsotericSoftware/kryo/blob/kryo-5.0.0-dev/test/com/esotericsoftware/kryo/util/GenericsUtilTest.java]), which avoids as much work as possible during serialization. The generic types known only at serialization time are tracked by Kryo's instance of [Generics|https://github.com/EsotericSoftware/kryo/blob/kryo-5.0.0-dev/src/com/esotericsoftware/kryo/util/Generics.java], which maintains a stack of [GenericType|https://github.com/EsotericSoftware/kryo/blob/kryo-5.0.0-dev/src/com/esotericsoftware/kryo/util/Generics.java#L221-L222] instances (lots of nice javadocs in those classes).

How does using the new generics API look? For a simple example with just one type parameter, see [CollectionSerializer|https://github.com/EsotericSoftware/kryo/blob/kryo-5.0.0-dev/src/com/esotericsoftware/kryo/serializers/CollectionSerializer.java#L80-L97]. To get the class, call kryo.getGenerics().nextGenericClass() then after reading the child objects call kryo.getGenerics().popGenericType(). Simple!

nextGenericClass() is a shortcut for the common case of a class with a single type parameter. When there are multiple type parameters, like [Map|https://github.com/EsotericSoftware/kryo/blob/kryo-5.0.0-dev/src/com/esotericsoftware/kryo/serializers/MapSerializer.java#L79-L89], use nextGenericTypes() instead, then call resolve() to get each class. Also, the last parameter is made current automatically (meaning it is pushed by Generics#nextGenericTypes()), so the other parameter(s) have to be [pushed and popped manually|https://github.com/EsotericSoftware/kryo/blob/kryo-5.0.0-dev/src/com/esotericsoftware/kryo/serializers/MapSerializer.java#L93-L101]. Somewhat less simple!

It's a little tricky, but these two patterns are all serializers need to worry about. It provides full support for nested generic types, eg HashMap<ArrayList<Integer>, ArrayList<String>>, which wasn't possible before.

Feedback on all this is welcome, but the core of it is relatively complex and may induce a headache (it certainly did when writing it!). The important bits are 1) to handle all generics scenarios, and 2) to minimize work at serialization time. Digging through just the calls for generics made from serializers, you'll find minimal work is done and without allocations. Given this, generics are always enabled.


 * Serializer method signatures have changed for [issue #146|https://github.com/EsotericSoftware/kryo/pull/146] in [this commit|https://github.com/EsotericSoftware/kryo/commit/d22a23dccd9b38461950084188f69597f29c4a9a]. All Serializer classes need to be changed from Class<T> to Class<? extends T>.


 * Various serialization improvements. Eg, TaggedFieldSerializer now has acceptsNull=true, which saves 1 byte for all non-null objects.


 * Unsafe had permeated the API. IMO it should be sandboxed as much as possible. I began refactoring by removing Unsafe support completely, made everything nice, then put back the input/output streams. I haven't yet looked into what serializers would need to make use of all of the Unsafe features. With Java 9 dropping Unsafe, maybe it is not worth the considerable effort to support it. FWIW, I personally don't use Unsafe but I know others do. I assume some even choosing Kryo specifically for Unsafe. I don't know how those people feel about Java 9.

I have not looked at Java 9 at all yet. I don't know if it provides APIs that Kryo can use to be more efficient.


 * FieldSerializer had gotten messy internally. It did a lot more work than necessary to build the cached fields. It did things with generics that were suspect and did expensive computations that were not even used. It had a lot of Unsafe logic.


 * Various API improvements. The Kryo class must not become a junk drawer for serializer settings. The SerializerFactory classes can be used, eg [FieldSerializerFactory|https://github.com/EsotericSoftware/kryo/blob/kryo-5.0.0-dev/src/com/esotericsoftware/kryo/SerializerFactory.java#L96-L98], which can create new, configured serializers ([example|http://n4te.com/x/367-0yar.txt]). Having Input/Output varint and varlong be a hint was odd. If this ends up being needed again, maybe depending on what happens with Unsafe, it could be done in Input/Output subclasses without affecting the base class API. Javadocs are much improved.


 * Logging is improved. Some useless junk was logged, some important junk was not logged, and the formatting of log messages was not consistent.


 * Deserialization still [temporarily modifies|https://github.com/EsotericSoftware/kryo/blob/kryo-5.0.0-dev/src/com/esotericsoftware/kryo/io/Input.java#L604-L606] the input buffer. While in many cases this is fine, it can be quite an unexpected gotcha in some cases. I'd like to remedy this but string writing is important and any changes here need extensive benchmarks.

> Updated to Kryo 5.0.3
> ---------------------
>
>                 Key: SPARK-34023
>                 URL: https://issues.apache.org/jira/browse/SPARK-34023
>             Project: Spark
>          Issue Type: Improvement
>          Components: Build, Spark Core
>    Affects Versions: 3.2.0
>            Reporter: Nick Nezis
>            Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org