You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Alexey Kudinkin (Jira)" <ji...@apache.org> on 2022/09/30 20:48:00 UTC

[jira] [Created] (HUDI-4959) Serializing HoodieKey objects using Kryo fails to deserialize data back w/o prior registration

Alexey Kudinkin created HUDI-4959:
-------------------------------------

             Summary: Serializing HoodieKey objects using Kryo fails to deserialize data back w/o prior registration
                 Key: HUDI-4959
                 URL: https://issues.apache.org/jira/browse/HUDI-4959
             Project: Apache Hudi
          Issue Type: Bug
          Components: writer-core
    Affects Versions: 0.12.0
            Reporter: Alexey Kudinkin
            Assignee: Alexey Kudinkin
             Fix For: 0.13.0


Kryo (used in SerializationUtils) by default allows class objects to be serialized w/o prior registration w/ Kryo: in that case Kryo will encode the first occurrence of the object of a particular class with full class-name, but subsequent occurrences will be using class-id associated with it (on the fly).

This poses issues for durable serialization (when we persist such serialized layout) in this case we're trying to deserialize file that doesn't have the class-name encoded and since user is running a different Spark job to read there's no association preserved in-memory either.

*NOTE: We should be using custom serialization sequences for every object we serialize for durable persistence, and avoid using frameworks like Kryo for that.*



--
This message was sent by Atlassian Jira
(v8.20.10#820010)