You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/04/27 07:19:00 UTC

[jira] [Commented] (FLINK-8992) Implement source and operator that validate exactly-once

    [ https://issues.apache.org/jira/browse/FLINK-8992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16456042#comment-16456042 ] 

ASF GitHub Bot commented on FLINK-8992:
---------------------------------------

GitHub user tzulitai opened a pull request:

    https://github.com/apache/flink/pull/5925

    [FLINK-8992] [e2e-tests] General purpose DataStream e2e test

    ## What is the purpose of the change
    
    This PR adds a general purpose Flink DataStream end-to-end test job, whose purpose is to cover main DataStream APIs and primitives and can be commonly used across several different end-to-end tests.
    
    The initial version currently covers the following characteristics -
    - A generic Kryo input type.
    - A state type for which we register a `KryoSerializer`.
    - Operators with `ValueState`.
    - Allows verifying exactly-once / at-least-once
    - Allows configuring to use different state backends
    
    For a full list of what we plan to add over time, please see description in the umbrella JIRA FLINK-8971.
    
    ## Brief change log
    
    - a936aaa initial version of the general purpose job by @StefanRRichter 
    - c7127a9 integrate job with end-to-end tests' current Maven project structure
    - fa82e2c Fix exactly-once bug in source, and allow enabling checkpointing
    - 3182f0c Adapt savepoint e2e test to use the general purpose job.
    
    ## Verifying this change
    
    The savepoint e2e test serves as a good, generic e2e test script for the general purpose job.
    It runs the job, takes a savepoint, and resumes again (potentially with a different parallelism).
    Metrics monitoring is done in the test script to ensure that the job has made progress.
    The job itself verifies exactly-once semantics; if there are no guarantee violations, there will be no output from the job. Whether or not there are outputs from the job is checked in the end by the test script (by checkpoint the .out files).
    
    ## Does this pull request potentially affect one of the following parts:
    
      - Dependencies (does it add or upgrade a dependency): (yes / **no**)
      - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes / **no**)
      - The serializers: (yes / **no** / don't know)
      - The runtime per-record code paths (performance sensitive): (yes / **no** / don't know)
      - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / **no** / don't know)
      - The S3 file system connector: (yes / **no** / don't know)
    
    ## Documentation
    
      - Does this pull request introduce a new feature? (yes / **no**)
      - If yes, how is the feature documented? (**not applicable** / docs / JavaDocs / not documented)


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tzulitai/flink FLINK-8992

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/5925.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5925
    
----
commit a936aaa65b6583cabc8c4ae7269a4a55ac48dd84
Author: Stefan Richter <s....@...>
Date:   2018-03-15T19:20:45Z

    [FLINK-8992] [e2e-tests] Initial general purpose DataStream job

commit c7127a9b746c1c51b384fea5050f5a041df30954
Author: Tzu-Li (Gordon) Tai <tz...@...>
Date:   2018-04-25T09:05:24Z

    [FLINK-8992] [e2e-tests] Integrate general DataStream test job with project structure
    
    This also includes minor cleanup of WIP code in the test job.

commit 253039b49d16f4971f237e6f808080bd7a3599a2
Author: Tzu-Li (Gordon) Tai <tz...@...>
Date:   2018-04-26T09:22:59Z

    [FLINK-8992] [e2e-tests] Add Javadocs for DataStreamAllroundTestProgram

commit fa82e2c56025a28cc7238b67b9595aa58690bd09
Author: Tzu-Li (Gordon) Tai <tz...@...>
Date:   2018-04-27T06:44:44Z

    [FLINK-8992] [e2e-tests] Ensure exactly-once in general purpose DataStream job

commit e71c5374f56a514332dc9a0a5716eddbfb8c6b62
Author: Tzu-Li (Gordon) Tai <tz...@...>
Date:   2018-04-27T07:01:13Z

    [FLINK-8992] [e2e-tests] Configurable source throttling for general purpose DataStream job

commit 3182f0ce8048923a652f7b5dd453b35e92740906
Author: Tzu-Li (Gordon) Tai <tz...@...>
Date:   2018-04-27T06:46:09Z

    [FLINK-8992] [e2e-tests] Let savepoint e2e test use general pupose DataStream job

----


> Implement source and operator that validate exactly-once
> --------------------------------------------------------
>
>                 Key: FLINK-8992
>                 URL: https://issues.apache.org/jira/browse/FLINK-8992
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Tests
>    Affects Versions: 1.5.0
>            Reporter: Stefan Richter
>            Assignee: Stefan Richter
>            Priority: Major
>
> We can buildĀ this with sources that emit sequences per key and a stateful (keyed) operator that validate for the update of each key that the new value is the old value + 1. This can help to easily detect if events/state were lost or duplicates.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)