You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/05/11 07:20:00 UTC
[jira] [Commented] (FLINK-9322) Add exception throwing map function
that simulates failures to the general purpose DataStream job
[ https://issues.apache.org/jira/browse/FLINK-9322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16471564#comment-16471564 ]
ASF GitHub Bot commented on FLINK-9322:
---------------------------------------
GitHub user tzulitai opened a pull request:
https://github.com/apache/flink/pull/5990
[FLINK-9322][FLINK-9320] [e2e] Improvements to e2e standalone chaos monkey test
## What is the purpose of the change
This PR is based on #5941. Only the last 2 commits are relevant.
This PR improves our standalone e2e chaos monkey test by:
- Using the general purpose DataStream job, instead of the state machine example, to have a wider coverage of commonly used DataStream program building blocks.
- Lets the running job simulate failures by throwing exceptions. This enhances the intensiveness of the chaos monkey test.
## Brief change log
- b01cfda Allows the general purpose job to configure whether or not to simulate failures. This resolves FLINK-9322.
- 4009406 in `test_ha.sh`, use the general purpose job instead. This change additionally lets the e2e test now have failures caused by the user application, and not just TM / JM shutdowns. It also changes the parameterization of the test script to be consistent with our other e2e test scripts.
## Verifying this change
This is purely a change to improve current e2e tests.
## Does this pull request potentially affect one of the following parts:
- Dependencies (does it add or upgrade a dependency): (yes / **no**)
- The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes / **no**)
- The serializers: (yes / **no** / don't know)
- The runtime per-record code paths (performance sensitive): (yes / **no** / don't know)
- Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / **no** / don't know)
- The S3 file system connector: (yes / **no** / don't know)
## Documentation
- Does this pull request introduce a new feature? (yes / **no**)
- If yes, how is the feature documented? (**not applicable** / docs / JavaDocs / not documented)
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/tzulitai/flink chaos-monkey-e2e
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/5990.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #5990
----
commit 8db7f894b67b00f94148e0314a1c10d76266a350
Author: Tzu-Li (Gordon) Tai <tz...@...>
Date: 2018-04-30T10:04:43Z
[hotfix] [e2e-tests] Make SequenceGeneratorSource usable for 0-size key ranges
commit c8e14673e58aed0f9625e38875ec85a776282ad4
Author: Tzu-Li (Gordon) Tai <tz...@...>
Date: 2018-04-30T10:05:46Z
[FLINK-8971] [e2e-tests] Include broadcast / union state in general purpose DataStream job
commit 78354b295832fa2ec5d829ec4ac21150ecac1231
Author: Tzu-Li (Gordon) Tai <tz...@...>
Date: 2018-05-08T03:44:13Z
PR review - refactor source run function
commit f346fd0958e7c3361886680912630fe22761a63d
Author: Tzu-Li (Gordon) Tai <tz...@...>
Date: 2018-05-08T04:39:40Z
PR review - simplify broadcast / union state verification
commit b01cfda7d77723e8ded2ce99ee12f17352a3ca1f
Author: Tzu-Li (Gordon) Tai <tz...@...>
Date: 2018-05-11T03:51:12Z
[FLINK-9322] [e2e] Add failure simulation to the general purpose DataStream job
commit 4009406d4729486d57cc4a71bcb72d269583a762
Author: Tzu-Li (Gordon) Tai <tz...@...>
Date: 2018-05-11T07:09:00Z
[FLINK-9320] [e2e] Update test_ha e2e to use general purpose DataStream job
----
> Add exception throwing map function that simulates failures to the general purpose DataStream job
> -------------------------------------------------------------------------------------------------
>
> Key: FLINK-9322
> URL: https://issues.apache.org/jira/browse/FLINK-9322
> Project: Flink
> Issue Type: Sub-task
> Components: Tests
> Reporter: Tzu-Li (Gordon) Tai
> Assignee: Tzu-Li (Gordon) Tai
> Priority: Major
>
> The general purpose DataStream job currently does not have any functionality to simulate user job failures.
> We can achieve this by:
> - Adding a simple-pass map function, that throws exceptions after a certain criteria is met
> - To allow for the end-to-end tests that we have in mind, criteria could be to fail after 1) processing X records, and 2) Y completed checkpoints (see FLINK-8977)
> - We should also allow specifying how many times to fail. Some chaos monkey tests (see FLINK-8973) would need to continuously fail several times, while FLINK-8977, for example, only needs to fail once.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)