You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/05/11 07:20:00 UTC

[jira] [Commented] (FLINK-9322) Add exception throwing map function that simulates failures to the general purpose DataStream job

    [ https://issues.apache.org/jira/browse/FLINK-9322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16471564#comment-16471564 ] 

ASF GitHub Bot commented on FLINK-9322:
---------------------------------------

GitHub user tzulitai opened a pull request:

    https://github.com/apache/flink/pull/5990

    [FLINK-9322][FLINK-9320] [e2e] Improvements to e2e standalone chaos monkey test

    ## What is the purpose of the change
    
    This PR is based on #5941. Only the last 2 commits are relevant.
    
    This PR improves our standalone e2e chaos monkey test by:
    - Using the general purpose DataStream job, instead of the state machine example, to have a wider coverage of commonly used DataStream program building blocks.
    - Lets the running job simulate failures by throwing exceptions. This enhances the intensiveness of the chaos monkey test.
    
    ## Brief change log
    
    - b01cfda Allows the general purpose job to configure whether or not to simulate failures. This resolves FLINK-9322.
    - 4009406 in `test_ha.sh`, use the general purpose job instead. This change additionally lets the e2e test now have failures caused by the user application, and not just TM / JM shutdowns. It also changes the parameterization of the test script to be consistent with our other e2e test scripts.
    
    ## Verifying this change
    
    This is purely a change to improve current e2e tests.
    
    ## Does this pull request potentially affect one of the following parts:
    
      - Dependencies (does it add or upgrade a dependency): (yes / **no**)
      - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes / **no**)
      - The serializers: (yes / **no** / don't know)
      - The runtime per-record code paths (performance sensitive): (yes / **no** / don't know)
      - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / **no** / don't know)
      - The S3 file system connector: (yes / **no** / don't know)
    
    ## Documentation
    
      - Does this pull request introduce a new feature? (yes / **no**)
      - If yes, how is the feature documented? (**not applicable** / docs / JavaDocs / not documented)


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tzulitai/flink chaos-monkey-e2e

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/5990.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5990
    
----
commit 8db7f894b67b00f94148e0314a1c10d76266a350
Author: Tzu-Li (Gordon) Tai <tz...@...>
Date:   2018-04-30T10:04:43Z

    [hotfix] [e2e-tests] Make SequenceGeneratorSource usable for 0-size key ranges

commit c8e14673e58aed0f9625e38875ec85a776282ad4
Author: Tzu-Li (Gordon) Tai <tz...@...>
Date:   2018-04-30T10:05:46Z

    [FLINK-8971] [e2e-tests] Include broadcast / union state in general purpose DataStream job

commit 78354b295832fa2ec5d829ec4ac21150ecac1231
Author: Tzu-Li (Gordon) Tai <tz...@...>
Date:   2018-05-08T03:44:13Z

    PR review - refactor source run function

commit f346fd0958e7c3361886680912630fe22761a63d
Author: Tzu-Li (Gordon) Tai <tz...@...>
Date:   2018-05-08T04:39:40Z

    PR review - simplify broadcast / union state verification

commit b01cfda7d77723e8ded2ce99ee12f17352a3ca1f
Author: Tzu-Li (Gordon) Tai <tz...@...>
Date:   2018-05-11T03:51:12Z

    [FLINK-9322] [e2e] Add failure simulation to the general purpose DataStream job

commit 4009406d4729486d57cc4a71bcb72d269583a762
Author: Tzu-Li (Gordon) Tai <tz...@...>
Date:   2018-05-11T07:09:00Z

    [FLINK-9320] [e2e] Update test_ha e2e to use general purpose DataStream job

----


> Add exception throwing map function that simulates failures to the general purpose DataStream job
> -------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-9322
>                 URL: https://issues.apache.org/jira/browse/FLINK-9322
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Tests
>            Reporter: Tzu-Li (Gordon) Tai
>            Assignee: Tzu-Li (Gordon) Tai
>            Priority: Major
>
> The general purpose DataStream job currently does not have any functionality to simulate user job failures.
> We can achieve this by:
> - Adding a simple-pass map function, that throws exceptions after a certain criteria is met
> - To allow for the end-to-end tests that we have in mind, criteria could be to fail after 1) processing X records, and 2) Y completed checkpoints (see FLINK-8977)
> - We should also allow specifying how many times to fail. Some chaos monkey tests (see FLINK-8973) would need to continuously fail several times, while FLINK-8977, for example, only needs to fail once.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)