You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by tzulitai <gi...@git.apache.org> on 2018/04/30 10:33:38 UTC

[GitHub] flink pull request #5941: [FLINK-8971] [e2e] Include broadcast / union state...

GitHub user tzulitai opened a pull request:

    https://github.com/apache/flink/pull/5941

    [FLINK-8971] [e2e] Include broadcast / union state in general purpose DataStream job

    ## What is the purpose of the change
    
    This PR extends the general purpose DataStream job to contain an operator that uses broadcast and union state.
    
    The operator is self-verifiable such that on restore of the job, if restored broadcast state / union state is incorrect, an exception will be thrown to fail the job.
    
    The fact that the `test_resume_savepoint` job uses the general purpose DataStream job ensures that we have an e2e test that covers broadcast / union state savepointing and resuming.
    
    ## Brief change log
    
    - 8315969 a preliminary fix that allows the sequence generator to have only one key
    - eb8df23 Introduces an operator that uses broadcast and union state
    
    ## Verifying this change
    
    This is an extension to existing tests.
    
    ## Does this pull request potentially affect one of the following parts:
    
      - Dependencies (does it add or upgrade a dependency): (yes / **no**)
      - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes / **no**)
      - The serializers: (yes / **no** / don't know)
      - The runtime per-record code paths (performance sensitive): (yes / **no** / don't know)
      - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / **no** / don't know)
      - The S3 file system connector: (yes / **no** / don't know)
    
    ## Documentation
    
      - Does this pull request introduce a new feature? (yes / **no**)
      - If yes, how is the feature documented? (**not applicable** / docs / JavaDocs / not documented)


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tzulitai/flink FLINK-8971

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/5941.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5941
    
----
commit 83159697b4d856b109863baee9d880bd2f1b4c86
Author: Tzu-Li (Gordon) Tai <tz...@...>
Date:   2018-04-30T10:04:43Z

    [hotfix] [e2e-tests] Make SequenceGeneratorSource usable for 0-size key ranges

commit eb8df230d1963ac7ca0e819aa5eb203d8a8682bf
Author: Tzu-Li (Gordon) Tai <tz...@...>
Date:   2018-04-30T10:05:46Z

    [FLINK-8971] [e2e-tests] Include broadcast / union state in general purpose DataStream job

----


---

[GitHub] flink issue #5941: [FLINK-8971] [e2e] Include broadcast / union state in gen...

Posted by tzulitai <gi...@git.apache.org>.
Github user tzulitai commented on the issue:

    https://github.com/apache/flink/pull/5941
  
    r: @kl0u @StefanRRichter 


---

[GitHub] flink issue #5941: [FLINK-8971] [e2e] Include broadcast / union state in gen...

Posted by tzulitai <gi...@git.apache.org>.
Github user tzulitai commented on the issue:

    https://github.com/apache/flink/pull/5941
  
    I think it is safe to merge this change.
    Will merge this now ..


---

[GitHub] flink issue #5941: [FLINK-8971] [e2e] Include broadcast / union state in gen...

Posted by tzulitai <gi...@git.apache.org>.
Github user tzulitai commented on the issue:

    https://github.com/apache/flink/pull/5941
  
    @StefanRRichter the PR is ready for another review, thanks!


---

[GitHub] flink pull request #5941: [FLINK-8971] [e2e] Include broadcast / union state...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/flink/pull/5941


---

[GitHub] flink issue #5941: [FLINK-8971] [e2e] Include broadcast / union state in gen...

Posted by StefanRRichter <gi...@git.apache.org>.
Github user StefanRRichter commented on the issue:

    https://github.com/apache/flink/pull/5941
  
    As discussed separately, I think that this could be simplified quiet a bit and that we do not require connected streams. The operator state does not have to depend on the input events. For example, it could just be something that is correlated with the subtask index, and you can reason about repartitioning by that.


---

[GitHub] flink pull request #5941: [FLINK-8971] [e2e] Include broadcast / union state...

Posted by StefanRRichter <gi...@git.apache.org>.
Github user StefanRRichter commented on a diff in the pull request:

    https://github.com/apache/flink/pull/5941#discussion_r186415017
  
    --- Diff: flink-end-to-end-tests/flink-datastream-allround-test/src/main/java/org/apache/flink/streaming/tests/SequenceGeneratorSource.java ---
    @@ -92,42 +92,63 @@
     	@Override
     	public void run(SourceContext<Event> ctx) throws Exception {
     
    -		Random random = new Random();
    +		if (keyRanges.size() > 0) {
    --- End diff --
    
    One suggestion because this methods became a bit long, why not break it down a bit? I would introduce two private methods, e.g., `runActive()` and `runIdle()` which are called in the `if` branches.


---