You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/06/25 10:56:00 UTC

[jira] [Commented] (FLINK-9654) Internal error while deserializing custom Scala TypeSerializer instances

    [ https://issues.apache.org/jira/browse/FLINK-9654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16522143#comment-16522143 ] 

ASF GitHub Bot commented on FLINK-9654:
---------------------------------------

GitHub user zsolt-donca opened a pull request:

    https://github.com/apache/flink/pull/6206

    [FLINK-9654] [core] Changed the check for anonymous classes to avoid InternalError

    …SI-2034.
    
    ## What is the purpose of the change
    
    This pull request avoids triggering [SI-2034](https://issues.scala-lang.org/browse/SI-2034) for Scala classes that are defined inside of Scala objects. The issue will be fixed only when Scala is will be officially supported by Java 9, as, after all, it's fixed in Java 9: https://bugs.openjdk.java.net/browse/JDK-8057919.
    
    As explained in [FLINK-9654](https://issues.apache.org/jira/browse/FLINK-9654), whenever there is a custom `TypeSerializer` implementation that, when serialized, has in its object graph a reference to a class that triggers [SI-2034](https://issues.scala-lang.org/browse/SI-2034), it makes the task manager instance fail, potentially bringing down the entire Flink cluster.
    
    ## Brief change log
      - made the classname-related checks happen *before* the call to `Class.isAnonymousClass`, after all, there is no reason to call it if we can know it just by looking at the class name;
      - added the check for "$macro$" in the name checks, as, after all, macro-generated classes are always anonymous;
      - wrapped the call to `isAnonymousClass` into a try-catch block, to catch the `InternalError` that the issue might trigger.
    
    ## Verifying this change
    
    This change is a trivial rework / code cleanup without any test coverage.
    
    ## Does this pull request potentially affect one of the following parts:
    
      - Dependencies (does it add or upgrade a dependency): no
      - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: no
      - The serializers: no
      - The runtime per-record code paths (performance sensitive): no
      - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: no
      - The S3 file system connector: no
    
    ## Documentation
    
      - Does this pull request introduce a new feature? no
      - If yes, how is the feature documented? not applicable


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zsolt-donca/flink FLINK-9654-internal-error-fix

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/6206.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #6206
    
----
commit fbad06a398fe58f8d312e0ed4dc6bdd31ac65d08
Author: Zsolt Donca <zs...@...>
Date:   2018-06-25T07:43:29Z

    FLINK-9654 Changed the way we check if a class is anonymous to avoid SI-2034.

----


> Internal error while deserializing custom Scala TypeSerializer instances
> ------------------------------------------------------------------------
>
>                 Key: FLINK-9654
>                 URL: https://issues.apache.org/jira/browse/FLINK-9654
>             Project: Flink
>          Issue Type: Bug
>            Reporter: Zsolt Donca
>            Priority: Major
>              Labels: pull-request-available
>
> When you are using custom `TypeSerializer` instances implemented in Scala, the Scala issue [SI-2034|https://issues.scala-lang.org/browse/SI-2034] can manifest itself when a Flink job is restored from checkpoint or started with a savepoint.
> The reason is that in such a restore from checkpoint or savepoint, Flink uses `InstantiationUtil.FailureTolerantObjectInputStream` to deserialize the type serializers and their configurations. The deserialization walks through the entire object graph corresponding, and for each class it calls `isAnonymousClass`, which, in turn, calls `getSimpleName` (mechanism in place for FLINK-6869). If there is an internal class defined in a Scala object for which `getSimpleName` fails (see the Scala issue), then a `java.lang.InternalError` is thrown which causes the task manager to restart. In this case, Flink tries to restart the job on another task manager, causing all the task managers to restart, wreaking havoc on the entire Flink cluster.
> There are some alternative type information derivation mechanisms that rely on anonymous classes and, most importantly, classes generated by macros, that can easily trigger the above problem. I am personally working on [https://github.com/zsolt-donca/flink-alt], and there is also [https://github.com/joroKr21/flink-shapeless]
> I prepared a pull request that fixes the issue. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)