You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by twalthr <gi...@git.apache.org> on 2017/07/31 18:24:59 UTC

[GitHub] flink pull request #4441: [FLINK-7301] [docs] Rework state documentation

GitHub user twalthr opened a pull request:

    https://github.com/apache/flink/pull/4441

    [FLINK-7301] [docs] Rework state documentation

    ## What is the purpose of the change
    
    *This PR restructures state related documentation pages. It introduces some state introduction page and moves some files (from `setup/` to `ops/`) according to the new documentation structure.*
    
    ## Brief change log
    
    *Documentation changes only.*
    
    ## Verifying this change
    
    *Built with built script and links checked.*
    
    ## Does this pull request potentially affect one of the following parts:
    
      - Dependencies (does it add or upgrade a dependency): no
      - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: no
      - The serializers: no
      - The runtime per-record code paths (performance sensitive): no
      - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: no
    
    ## Documentation
    
      - Does this pull request introduce a new feature? no
      - If yes, how is the feature documented? not applicable
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/twalthr/flink FLINK-7301

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/4441.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #4441
    
----
commit b53e758216364903f211277052dbba4ae99da7d3
Author: twalthr <tw...@apache.org>
Date:   2017-07-31T18:14:31Z

    [FLINK-7301] [docs] Rework state documentation

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #4441: [FLINK-7301] [docs] Rework state documentation

Posted by alpinegizmo <gi...@git.apache.org>.
Github user alpinegizmo commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4441#discussion_r131086039
  
    --- Diff: docs/dev/stream/state/custom_serialization.md ---
    @@ -0,0 +1,188 @@
    +---
    +title: "Custom Serialization for Managed State"
    +nav-title: "Custom Serialization"
    +nav-parent_id: streaming_state
    +nav-pos: 10
    +---
    +<!--
    +Licensed to the Apache Software Foundation (ASF) under one
    +or more contributor license agreements.  See the NOTICE file
    +distributed with this work for additional information
    +regarding copyright ownership.  The ASF licenses this file
    +to you under the Apache License, Version 2.0 (the
    +"License"); you may not use this file except in compliance
    +with the License.  You may obtain a copy of the License at
    +
    +  http://www.apache.org/licenses/LICENSE-2.0
    +
    +Unless required by applicable law or agreed to in writing,
    +software distributed under the License is distributed on an
    +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    +KIND, either express or implied.  See the License for the
    +specific language governing permissions and limitations
    +under the License.
    +-->
    +
    +If your application uses Flink's managed state, it might be necessary to implement a custom serialization logic for special use cases.
    +
    +This page is targeted as a guideline for users who require the use of custom serialization for their state, covering how
    +to provide a custom serializer and how to handle upgrades to the serializer for compatibility. If you're simply using
    +Flink's own serializers, this page is irrelevant and can be skipped.
    +
    +### Using custom serializers
    +
    +As demonstrated in the above examples, when registering a managed operator or keyed state, a `StateDescriptor` is required
    +to specify the state's name, as well as information about the type of the state. The type information is used by Flink's
    +[type serialization framework](../../types_serialization.html) to create appropriate serializers for the state.
    +
    +It is also possible to completely bypass this and let Flink use your own custom serializer to serialize managed states,
    +simply by directly instantiating the `StateDescriptor` with your own `TypeSerializer` implementation:
    +
    +<div class="codetabs" markdown="1">
    +<div data-lang="java" markdown="1">
    +{% highlight java %}
    +public class CustomTypeSerializer extends TypeSerializer<Tuple2<String, Integer>> {...};
    +
    +ListStateDescriptor<Tuple2<String, Integer>> descriptor =
    +    new ListStateDescriptor<>(
    +        "state-name",
    +        new CustomTypeSerializer());
    +
    +checkpointedState = getRuntimeContext().getListState(descriptor);
    +{% endhighlight %}
    +</div>
    +
    +<div data-lang="scala" markdown="1">
    +{% highlight scala %}
    +class CustomTypeSerializer extends TypeSerializer[(String, Integer)] {...}
    +
    +val descriptor = new ListStateDescriptor[(String, Integer)](
    +    "state-name",
    +    new CustomTypeSerializer)
    +)
    +
    +checkpointedState = getRuntimeContext.getListState(descriptor);
    +{% endhighlight %}
    +</div>
    +</div>
    +
    +Note that Flink writes state serializers along with the state as metadata. In certain cases on restore (see following
    +subsections), the written serializer needs to be deserialized and used. Therefore, it is recommended to avoid using
    +anonymous classes as your state serializers. Anonymous classes do not have a guarantee on the generated classname,
    +varying across compilers and depends on the order that they are instantiated within the enclosing class, which can 
    --- End diff --
    
    "varying across compilers and depends" ==> "which varies across compilers and depends"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #4441: [FLINK-7301] [docs] Rework state documentation

Posted by alpinegizmo <gi...@git.apache.org>.
Github user alpinegizmo commented on the issue:

    https://github.com/apache/flink/pull/4441
  
    @twalthr Duh, of course, you're right. 
    
    +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #4441: [FLINK-7301] [docs] Rework state documentation

Posted by twalthr <gi...@git.apache.org>.
Github user twalthr commented on the issue:

    https://github.com/apache/flink/pull/4441
  
    @alpinegizmo I thought about adding redirects, but we would be in redirect hell if we would add every single page in the future. Actually, only links to the master docs change and we should not use links to master docs in trainings/stackoverflow anyway. Proper links to released docs remain unchanged.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #4441: [FLINK-7301] [docs] Rework state documentation

Posted by alpinegizmo <gi...@git.apache.org>.
Github user alpinegizmo commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4441#discussion_r131087040
  
    --- Diff: docs/dev/stream/state/index.md ---
    @@ -0,0 +1,56 @@
    +---
    +title: "State & Fault Tolerance"
    +nav-id: streaming_state
    +nav-title: "State & Fault Tolerance"
    +nav-parent_id: streaming
    +nav-pos: 3
    +nav-show_overview: true
    +---
    +<!--
    +Licensed to the Apache Software Foundation (ASF) under one
    +or more contributor license agreements.  See the NOTICE file
    +distributed with this work for additional information
    +regarding copyright ownership.  The ASF licenses this file
    +to you under the Apache License, Version 2.0 (the
    +"License"); you may not use this file except in compliance
    +with the License.  You may obtain a copy of the License at
    +
    +  http://www.apache.org/licenses/LICENSE-2.0
    +
    +Unless required by applicable law or agreed to in writing,
    +software distributed under the License is distributed on an
    +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    +KIND, either express or implied.  See the License for the
    +specific language governing permissions and limitations
    +under the License.
    +-->
    +
    +Stateful functions and operators store data across the processing of individual elements/events, making state a critical building block for
    +any type of more elaborate operation.
    +
    +For example:
    +
    +  - When an application searches for certain event patterns, the state will store the sequence of events encountered so far.
    +  - When aggregating events per minute/hour/day, the state holds the pending aggregates.
    +  - When training a machine learning model over a stream of data points, the state holds the current version of the model parameters.
    +  - When historic data needs to be managed, the state allows efficient access to events occured in the past. 
    +
    +Flink needs to be aware of the state in order to make state fault tolerant using [checkpoints](checkpointing.html) and allow [savepoints]({{ site.baseurl }}/ops/state/savepoints.html) of streaming applications.
    --- End diff --
    
    "and to allow [savepoints]"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #4441: [FLINK-7301] [docs] Rework state documentation

Posted by twalthr <gi...@git.apache.org>.
Github user twalthr commented on the issue:

    https://github.com/apache/flink/pull/4441
  
    CC @alpinegizmo 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #4441: [FLINK-7301] [docs] Rework state documentation

Posted by alpinegizmo <gi...@git.apache.org>.
Github user alpinegizmo commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4441#discussion_r131085493
  
    --- Diff: docs/dev/stream/state/custom_serialization.md ---
    @@ -0,0 +1,188 @@
    +---
    +title: "Custom Serialization for Managed State"
    +nav-title: "Custom Serialization"
    +nav-parent_id: streaming_state
    +nav-pos: 10
    +---
    +<!--
    +Licensed to the Apache Software Foundation (ASF) under one
    +or more contributor license agreements.  See the NOTICE file
    +distributed with this work for additional information
    +regarding copyright ownership.  The ASF licenses this file
    +to you under the Apache License, Version 2.0 (the
    +"License"); you may not use this file except in compliance
    +with the License.  You may obtain a copy of the License at
    +
    +  http://www.apache.org/licenses/LICENSE-2.0
    +
    +Unless required by applicable law or agreed to in writing,
    +software distributed under the License is distributed on an
    +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    +KIND, either express or implied.  See the License for the
    +specific language governing permissions and limitations
    +under the License.
    +-->
    +
    +If your application uses Flink's managed state, it might be necessary to implement a custom serialization logic for special use cases.
    --- End diff --
    
    drop the word "a" in "implement a custom serialization logic" so that it reads "implement custom serialization logic"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #4441: [FLINK-7301] [docs] Rework state documentation

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/flink/pull/4441


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #4441: [FLINK-7301] [docs] Rework state documentation

Posted by twalthr <gi...@git.apache.org>.
Github user twalthr commented on the issue:

    https://github.com/apache/flink/pull/4441
  
    Thanks @alpinegizmo. I will merge this now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---