You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by azagrebin <gi...@git.apache.org> on 2018/05/02 09:58:09 UTC

[GitHub] flink pull request #5947: [FLINK-8978] Stateful generic stream job upgrade e...

GitHub user azagrebin opened a pull request:

    https://github.com/apache/flink/pull/5947

    [FLINK-8978] Stateful generic stream job upgrade e2e test

    ## What is the purpose of the change
    
    e2e test for generic state job upgrade and state recovery based operator uid.
    
    ## Brief change log
    
      - extract `DataStreamAllroundTestJobFactory` from `DataStreamAllroundTestProgram` to reuse generic components and their cli config.
      - extract some code snippets into functions from `test_scripts/test_resume_savepoint.sh` to `test_scripts/common.sh` to reuse them in this new test
      - add `flink-stream-stateful-job-upgrade-test` module to flink e2e tests
      - add `StatefulStreamJobUpgradeTestProgram` which is constructed from `DataStreamAllroundTestJobFactory`
      - add `test_scripts/test_stateful_stream_job_upgrade.sh` to `run-nightly-tests.sh`
    
    ## Verifying this change
    
    run from flink repo root
    ```bash
    $ mvn clean package
    $ FLINK_DIR=build-target flink-end-to-end-tests/test-scripts/test_stateful_stream_job_upgrade.sh
    ```
    
    ## Does this pull request potentially affect one of the following parts:
    
      - Dependencies (does it add or upgrade a dependency): (no)
      - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no)
      - The serializers: (no)
      - The runtime per-record code paths (performance sensitive): (no)
      - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
      - The S3 file system connector: (no)
    
    ## Documentation
    
      - Does this pull request introduce a new feature? (no)
      - If yes, how is the feature documented? (JavaDocs)


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/azagrebin/flink FLINK-8978

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/5947.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5947
    
----
commit 4282b9cee00dafd4963a3a9902e63cc0fa77d385
Author: Andrey Zagrebin <an...@...>
Date:   2018-04-30T18:25:53Z

    [FLINK-8978] Stateful generic stream job upgrade e2e test

----


---

[GitHub] flink issue #5947: [FLINK-8978] Stateful generic stream job upgrade e2e test

Posted by StefanRRichter <gi...@git.apache.org>.
Github user StefanRRichter commented on the issue:

    https://github.com/apache/flink/pull/5947
  
    LGTM 👍 Will merge this.


---

[GitHub] flink pull request #5947: [FLINK-8978] Stateful generic stream job upgrade e...

Posted by StefanRRichter <gi...@git.apache.org>.
Github user StefanRRichter commented on a diff in the pull request:

    https://github.com/apache/flink/pull/5947#discussion_r185537704
  
    --- Diff: flink-end-to-end-tests/flink-stream-stateful-job-upgrade-test/src/main/java/org/apache/flink/streaming/tests/StatefulStreamJobUpgradeTestProgram.java ---
    @@ -0,0 +1,128 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.flink.streaming.tests;
    +
    +import org.apache.flink.api.common.functions.JoinFunction;
    +import org.apache.flink.api.common.functions.MapFunction;
    +import org.apache.flink.api.common.typeutils.TypeSerializer;
    +import org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer;
    +import org.apache.flink.api.java.utils.ParameterTool;
    +import org.apache.flink.configuration.ConfigOption;
    +import org.apache.flink.configuration.ConfigOptions;
    +import org.apache.flink.streaming.api.datastream.KeyedStream;
    +import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
    +import org.apache.flink.streaming.api.functions.sink.PrintSinkFunction;
    +import org.apache.flink.streaming.tests.artificialstate.eventpayload.ComplexPayload;
    +
    +import static org.apache.flink.streaming.tests.DataStreamAllroundTestJobFactory.createArtificialKeyedStateMapper;
    +import static org.apache.flink.streaming.tests.DataStreamAllroundTestJobFactory.createEventSource;
    +import static org.apache.flink.streaming.tests.DataStreamAllroundTestJobFactory.createSemanticsCheckMapper;
    +import static org.apache.flink.streaming.tests.DataStreamAllroundTestJobFactory.createTimestampExtractor;
    +import static org.apache.flink.streaming.tests.DataStreamAllroundTestJobFactory.setupEnvironment;
    +
    +import java.util.Collections;
    +import java.util.List;
    +
    +/**
    + * Test upgrade of generic stateful job for Flink's DataStream API operators and primitives.
    + *
    + * <p>The job is constructed of generic components from {@link DataStreamAllroundTestJobFactory}.
    + * The gaol is to test successful state restoration after taking savepoint and recovery with new job version.
    + * It can be configured with '--test.job.variant' to run different variants of it:
    + * <ul>
    + *     <li><b>original:</b> includes 2 custom stateful map operators</li>
    + *     <li><b>upgraded:</b> changes order of 2 custom stateful map operators and adds one more</li>
    + * </ul>
    + */
    --- End diff --
    
    I think we should add into the comment on job classes all possible configuration options are in the comment of `DataStreamAllroundTestJobFactory` so that user can easily find them.


---

[GitHub] flink issue #5947: [FLINK-8978] Stateful generic stream job upgrade e2e test

Posted by azagrebin <gi...@git.apache.org>.
Github user azagrebin commented on the issue:

    https://github.com/apache/flink/pull/5947
  
    Thanks for review and good points @StefanRRichter 
    I updated the PR to address the comments. 
    The resume state e2e test also checks operator <-> state correspondence upon state restoration now.


---

[GitHub] flink pull request #5947: [FLINK-8978] Stateful generic stream job upgrade e...

Posted by StefanRRichter <gi...@git.apache.org>.
Github user StefanRRichter commented on a diff in the pull request:

    https://github.com/apache/flink/pull/5947#discussion_r185537151
  
    --- Diff: flink-end-to-end-tests/flink-stream-stateful-job-upgrade-test/src/main/java/org/apache/flink/streaming/tests/StatefulStreamJobUpgradeTestProgram.java ---
    @@ -0,0 +1,128 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.flink.streaming.tests;
    +
    +import org.apache.flink.api.common.functions.JoinFunction;
    +import org.apache.flink.api.common.functions.MapFunction;
    +import org.apache.flink.api.common.typeutils.TypeSerializer;
    +import org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer;
    +import org.apache.flink.api.java.utils.ParameterTool;
    +import org.apache.flink.configuration.ConfigOption;
    +import org.apache.flink.configuration.ConfigOptions;
    +import org.apache.flink.streaming.api.datastream.KeyedStream;
    +import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
    +import org.apache.flink.streaming.api.functions.sink.PrintSinkFunction;
    +import org.apache.flink.streaming.tests.artificialstate.eventpayload.ComplexPayload;
    +
    +import static org.apache.flink.streaming.tests.DataStreamAllroundTestJobFactory.createArtificialKeyedStateMapper;
    +import static org.apache.flink.streaming.tests.DataStreamAllroundTestJobFactory.createEventSource;
    +import static org.apache.flink.streaming.tests.DataStreamAllroundTestJobFactory.createSemanticsCheckMapper;
    +import static org.apache.flink.streaming.tests.DataStreamAllroundTestJobFactory.createTimestampExtractor;
    +import static org.apache.flink.streaming.tests.DataStreamAllroundTestJobFactory.setupEnvironment;
    +
    +import java.util.Collections;
    +import java.util.List;
    +
    +/**
    + * Test upgrade of generic stateful job for Flink's DataStream API operators and primitives.
    + *
    + * <p>The job is constructed of generic components from {@link DataStreamAllroundTestJobFactory}.
    + * The gaol is to test successful state restoration after taking savepoint and recovery with new job version.
    --- End diff --
    
    typo `gaol`


---

[GitHub] flink issue #5947: [FLINK-8978] Stateful generic stream job upgrade e2e test

Posted by azagrebin <gi...@git.apache.org>.
Github user azagrebin commented on the issue:

    https://github.com/apache/flink/pull/5947
  
    agree, I added constant check of previous non-null state into its update method


---

[GitHub] flink pull request #5947: [FLINK-8978] Stateful generic stream job upgrade e...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/flink/pull/5947


---

[GitHub] flink issue #5947: [FLINK-8978] Stateful generic stream job upgrade e2e test

Posted by azagrebin <gi...@git.apache.org>.
Github user azagrebin commented on the issue:

    https://github.com/apache/flink/pull/5947
  
    cc @StefanRRichter 


---

[GitHub] flink pull request #5947: [FLINK-8978] Stateful generic stream job upgrade e...

Posted by StefanRRichter <gi...@git.apache.org>.
Github user StefanRRichter commented on a diff in the pull request:

    https://github.com/apache/flink/pull/5947#discussion_r185541401
  
    --- Diff: flink-end-to-end-tests/flink-stream-stateful-job-upgrade-test/src/main/java/org/apache/flink/streaming/tests/StatefulStreamJobUpgradeTestProgram.java ---
    @@ -0,0 +1,128 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.flink.streaming.tests;
    +
    +import org.apache.flink.api.common.functions.JoinFunction;
    +import org.apache.flink.api.common.functions.MapFunction;
    +import org.apache.flink.api.common.typeutils.TypeSerializer;
    +import org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer;
    +import org.apache.flink.api.java.utils.ParameterTool;
    +import org.apache.flink.configuration.ConfigOption;
    +import org.apache.flink.configuration.ConfigOptions;
    +import org.apache.flink.streaming.api.datastream.KeyedStream;
    +import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
    +import org.apache.flink.streaming.api.functions.sink.PrintSinkFunction;
    +import org.apache.flink.streaming.tests.artificialstate.eventpayload.ComplexPayload;
    +
    +import static org.apache.flink.streaming.tests.DataStreamAllroundTestJobFactory.createArtificialKeyedStateMapper;
    +import static org.apache.flink.streaming.tests.DataStreamAllroundTestJobFactory.createEventSource;
    +import static org.apache.flink.streaming.tests.DataStreamAllroundTestJobFactory.createSemanticsCheckMapper;
    +import static org.apache.flink.streaming.tests.DataStreamAllroundTestJobFactory.createTimestampExtractor;
    +import static org.apache.flink.streaming.tests.DataStreamAllroundTestJobFactory.setupEnvironment;
    +
    +import java.util.Collections;
    +import java.util.List;
    +
    +/**
    + * Test upgrade of generic stateful job for Flink's DataStream API operators and primitives.
    + *
    + * <p>The job is constructed of generic components from {@link DataStreamAllroundTestJobFactory}.
    + * The gaol is to test successful state restoration after taking savepoint and recovery with new job version.
    + * It can be configured with '--test.job.variant' to run different variants of it:
    + * <ul>
    + *     <li><b>original:</b> includes 2 custom stateful map operators</li>
    + *     <li><b>upgraded:</b> changes order of 2 custom stateful map operators and adds one more</li>
    + * </ul>
    + */
    +public class StatefulStreamJobUpgradeTestProgram {
    +	private static final String TEST_JOB_VARIANT_ORIGINAL = "original";
    +	private static final String TEST_JOB_VARIANT_UPGRADED = "upgraded";
    +
    +	private static final JoinFunction<Event, ComplexPayload, ComplexPayload> SIMPLE_STATE_UPDATE =
    +		(Event first, ComplexPayload second) -> new ComplexPayload(first);
    +	private static final JoinFunction<Event, ComplexPayload, ComplexPayload> LAST_EVENT_STATE_UPDATE =
    +		(Event first, ComplexPayload second) ->
    +			(second != null && first.getEventTime() <= second.getEventTime()) ? second : new ComplexPayload(first);
    +
    +	private static final ConfigOption<String> TEST_JOB_VARIANT = ConfigOptions
    +		.key("test.job.variant")
    +		.defaultValue(TEST_JOB_VARIANT_ORIGINAL)
    +		.withDescription(String.format("This configures the job variant to test. Can be '%s' or '%s'",
    +			TEST_JOB_VARIANT_ORIGINAL, TEST_JOB_VARIANT_UPGRADED));
    +
    +	public static void main(String[] args) throws Exception {
    +		final ParameterTool pt = ParameterTool.fromArgs(args);
    +
    +		final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
    +
    +		setupEnvironment(env, pt);
    +
    +		KeyedStream<Event, Integer> source = env.addSource(createEventSource(pt))
    +			.assignTimestampsAndWatermarks(createTimestampExtractor(pt))
    +			.keyBy(Event::getKey);
    +
    +		List<TypeSerializer<ComplexPayload>> stateSer =
    +			Collections.singletonList(new KryoSerializer<>(ComplexPayload.class, env.getConfig()));
    +
    +		boolean isOriginal = pt.get(TEST_JOB_VARIANT.key()).equals(TEST_JOB_VARIANT_ORIGINAL);
    --- End diff --
    
    We could throw an `IllegalArgumentException` for any unexpected string.


---

[GitHub] flink pull request #5947: [FLINK-8978] Stateful generic stream job upgrade e...

Posted by StefanRRichter <gi...@git.apache.org>.
Github user StefanRRichter commented on a diff in the pull request:

    https://github.com/apache/flink/pull/5947#discussion_r185840213
  
    --- Diff: flink-end-to-end-tests/flink-datastream-allround-test/src/main/java/org/apache/flink/streaming/tests/artificialstate/eventpayload/ArtificialValueStateBuilder.java ---
    @@ -33,21 +33,28 @@
     	private static final long serialVersionUID = -1205814329756790916L;
     
     	private transient ValueState<STATE> valueState;
    +	private transient boolean afterRestoration;
     	private final TypeSerializer<STATE> typeSerializer;
     	private final JoinFunction<IN, STATE, STATE> stateValueGenerator;
    +	private final RestoredStateVerifier<STATE> restoredStateVerifier;
     
     	public ArtificialValueStateBuilder(
     		String stateName,
     		JoinFunction<IN, STATE, STATE> stateValueGenerator,
    -		TypeSerializer<STATE> typeSerializer) {
    -
    +		TypeSerializer<STATE> typeSerializer,
    +		RestoredStateVerifier<STATE> restoredStateVerifier) {
     		super(stateName);
     		this.typeSerializer = typeSerializer;
     		this.stateValueGenerator = stateValueGenerator;
    +		this.restoredStateVerifier = restoredStateVerifier;
     	}
     
     	@Override
     	public void artificialStateForElement(IN event) throws Exception {
    +		if (afterRestoration) {
    --- End diff --
    
    I find this way of checking the state rather invasive not completely thorough. There is now a pretty tight coupling between creating artificial state and checking something about it on restore. In particular, there is a hardcoded way now when to check. This makes it harder to reuse the classes in further test jobs that we might want to build with them. Can't we use a way that is more based on composition? For example, wrap the state builder in a state checker? This is also only doing just one check, so if the input element has a key that we never encountered, the state is `null` and there might be no check.


---

[GitHub] flink pull request #5947: [FLINK-8978] Stateful generic stream job upgrade e...

Posted by StefanRRichter <gi...@git.apache.org>.
Github user StefanRRichter commented on a diff in the pull request:

    https://github.com/apache/flink/pull/5947#discussion_r185840683
  
    --- Diff: flink-end-to-end-tests/flink-datastream-allround-test/src/main/java/org/apache/flink/streaming/tests/artificialstate/eventpayload/ArtificialValueStateBuilder.java ---
    @@ -33,21 +33,28 @@
     	private static final long serialVersionUID = -1205814329756790916L;
     
     	private transient ValueState<STATE> valueState;
    +	private transient boolean afterRestoration;
     	private final TypeSerializer<STATE> typeSerializer;
     	private final JoinFunction<IN, STATE, STATE> stateValueGenerator;
    +	private final RestoredStateVerifier<STATE> restoredStateVerifier;
     
     	public ArtificialValueStateBuilder(
     		String stateName,
     		JoinFunction<IN, STATE, STATE> stateValueGenerator,
    -		TypeSerializer<STATE> typeSerializer) {
    -
    +		TypeSerializer<STATE> typeSerializer,
    +		RestoredStateVerifier<STATE> restoredStateVerifier) {
     		super(stateName);
     		this.typeSerializer = typeSerializer;
     		this.stateValueGenerator = stateValueGenerator;
    +		this.restoredStateVerifier = restoredStateVerifier;
     	}
     
     	@Override
     	public void artificialStateForElement(IN event) throws Exception {
    +		if (afterRestoration) {
    --- End diff --
    
    As this is a test job, I think it might not hurt to just check every element after a restore.


---

[GitHub] flink issue #5947: [FLINK-8978] Stateful generic stream job upgrade e2e test

Posted by StefanRRichter <gi...@git.apache.org>.
Github user StefanRRichter commented on the issue:

    https://github.com/apache/flink/pull/5947
  
    Please also double check, it seems there are files without license header.


---