You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/04/20 13:37:00 UTC

[jira] [Commented] (FLINK-8715) RocksDB does not propagate reconfiguration of serializer to the states

    [ https://issues.apache.org/jira/browse/FLINK-8715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16445733#comment-16445733 ] 

ASF GitHub Bot commented on FLINK-8715:
---------------------------------------

GitHub user tzulitai opened a pull request:

    https://github.com/apache/flink/pull/5885

    [FLINK-8715] Remove usage of StateDescriptor in state handles

    ## What is the purpose of the change
    
    This PR is WIP, and is still lacking test coverage.
    It is opened now to collect some feedback for a proposed solution for FLINK-8715.
    
    Previously, reconfigured state serializers on restore were not properly forwarded to the state handles. In the past, the `StateDescriptor` served as the holder for the reconfigured serializer.
    However, since 88ffad27, `StateDescriptor#getSerializer()` started giving out duplicates of the serializer, which caused reconfigured serializers to be a completely different copy then what the state handles were using.
    
    This fix corrects this by explicitly forwarding the serializer to the instantiated state handles after the state is registered at the state backend. It also eliminates the use of `StateDescriptor`s internally in the state handles, so that the behaviour is independent of the `StateDescriptor#getSerializer()` method's implementation.
    
    The alternative to this approach is to have an internal `setSerializer` method on the `StateDescriptor`, which should be used after state serializers are reconfigured on registration.
    Then, that assures that handed out serializers by the descriptor are always reconfigured, as soon as the descriptor is registered at the backend.
    
    ## Brief change log
    
    - Remove `StateDescriptor`s from heap / RocksDB state handle classes
    - Forwards state serializer and any other necessary information provided by the state descriptor (e.g. default value, user functions, nested serializers, etc.) when instantiating state handles.
    
    ## Verifying this change
    
    This fix still lacks test coverage.
    It has been opened to collect feedback for the approach.
    
    ## Does this pull request potentially affect one of the following parts:
    
      - Dependencies (does it add or upgrade a dependency): (yes / (**no**)
      - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes / **no**)
      - The serializers: (**yes** / no / don't know)
      - The runtime per-record code paths (performance sensitive): (**yes** / no / don't know)
      - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / **no** / don't know)
      - The S3 file system connector: (yes / **no** / don't know)
    
    ## Documentation
    
      - Does this pull request introduce a new feature? (yes / **no**)
      - If yes, how is the feature documented? (**not applicable** / docs / JavaDocs / not documented)


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tzulitai/flink FLINK-8715

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/5885.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5885
    
----
commit c092dd6518d9e6f47f4cfc797c18bedc8a89cc05
Author: Tzu-Li (Gordon) Tai <tz...@...>
Date:   2018-04-20T13:15:42Z

    [FLINK-8715] Remove usage of StateDescriptor in state handles

----


> RocksDB does not propagate reconfiguration of serializer to the states
> ----------------------------------------------------------------------
>
>                 Key: FLINK-8715
>                 URL: https://issues.apache.org/jira/browse/FLINK-8715
>             Project: Flink
>          Issue Type: Bug
>          Components: State Backends, Checkpointing
>    Affects Versions: 1.3.2
>            Reporter: Arvid Heise
>            Assignee: Tzu-Li (Gordon) Tai
>            Priority: Blocker
>             Fix For: 1.5.0
>
>
> Any changes to the serializer done in #ensureCompability are lost during the state creation.
> In particular, [https://github.com/apache/flink/blob/master/flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBValueState.java#L68] always uses a fresh copy of the StateDescriptor.
> An easy fix is to pass the reconfigured serializer as an additional parameter in [https://github.com/apache/flink/blob/master/flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBKeyedStateBackend.java#L1681] , which can be retrieved through the side-output of getColumnFamily
> {code:java}
> kvStateInformation.get(stateDesc.getName()).f1.getStateSerializer()
> {code}
> I encountered it in 1.3.2 but the code in the master seems unchanged (hence the pointer into master). I encountered it in ValueState, but I suspect the same issue can be observed for all kinds of RocksDB states.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)