You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2020/05/14 15:11:54 UTC

[GitHub] [flink] StephanEwen opened a new pull request #12155: [various] Cleanups and improvements to the SourceOperators

StephanEwen opened a new pull request #12155:
URL: https://github.com/apache/flink/pull/12155


   ## What is the purpose of the change
   
   This PR contains various small fixes and improvements for the `SourceOperator` implementation.
   Nothing is any change in user-facing behavior, these are only improvements for better future maintenance.
   
   ## Brief change log
   
   ### (1) Simplify State Access in SourceOperator
   
   The SourceOperator has some boiler plate code taking the bytes out of the `ListState<byte[]>` and applying the `SimpleVersionedSerializer` to turn them into the splits.
   
   This change encapsulates that code in a utility class `SimpleVersionedListState<SplitT>` which wraps a `ListState<byte[]>` and applies the serialization and de-serialization.
   
   ### (2) Initialization of the SourceOperator
   
   Before this change, the `SourceOperator` takes a `Source` in the constructor.
   All actual components that the `SourceOperator` relies on when working are lazily initialized, in `open()` or via setters. This change moves towards more eager initialization, as is the purpose of the new `SourceOperatorFactory`-based appraoch.
   
   Relying on something as broad as Source also means that a lot of redundant context has to be provided to the `SourceOperator` during initialization. The Source is, for example, also responsible for the `SourceEnumerator`, which is independent of the `SourceOperator`. However, it needed to be considered during testing, because the tests need to mock a full `Source` in order to instantiate a SourceOperator.
   
   This change passes the collaborators of the `SourceOperator` directly eagerly into the constructor. It is not fully possible with the `SourceReader`, but for that we can still reduce the scope by passing a targeted factory function.
   
   ## Verifying this change
   
   This PR does not change any behavior, only internal design. The functionality is already covered by existing unit tests. The PR adjusts the relevant teste where needed.
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): **no**
     - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: **no**
     - The serializers: **no**
     - The runtime per-record code paths (performance sensitive): **no**
     - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: **no**
     - The S3 file system connector: **no**
   
   ## Documentation
   
     - Does this pull request introduce a new feature? **no**
     - If yes, how is the feature documented? **not applicable**
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot commented on pull request #12155: [various] Cleanups and improvements to the SourceOperators

Posted by GitBox <gi...@apache.org>.
flinkbot commented on pull request #12155:
URL: https://github.com/apache/flink/pull/12155#issuecomment-628719416


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "9b264bba83656c10c6db6ce0b47f9f122bdc8e30",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "9b264bba83656c10c6db6ce0b47f9f122bdc8e30",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 9b264bba83656c10c6db6ce0b47f9f122bdc8e30 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] StephanEwen commented on pull request #12155: [various] Cleanups and improvements to the SourceOperators

Posted by GitBox <gi...@apache.org>.
StephanEwen commented on pull request #12155:
URL: https://github.com/apache/flink/pull/12155#issuecomment-629663514


   Manually merged in 939625f2c84bdce6872548d3df672f492e33a704


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] StephanEwen commented on pull request #12155: [various] Cleanups and improvements to the SourceOperators

Posted by GitBox <gi...@apache.org>.
StephanEwen commented on pull request #12155:
URL: https://github.com/apache/flink/pull/12155#issuecomment-629338934


   @becketqin This is based on the changes we discussed before.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] StephanEwen closed pull request #12155: [various] Cleanups and improvements to the SourceOperators

Posted by GitBox <gi...@apache.org>.
StephanEwen closed pull request #12155:
URL: https://github.com/apache/flink/pull/12155


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] StephanEwen commented on pull request #12155: [various] Cleanups and improvements to the SourceOperators

Posted by GitBox <gi...@apache.org>.
StephanEwen commented on pull request #12155:
URL: https://github.com/apache/flink/pull/12155#issuecomment-629631846


   Thanks! Merging this now...


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot commented on pull request #12155: [various] Cleanups and improvements to the SourceOperators

Posted by GitBox <gi...@apache.org>.
flinkbot commented on pull request #12155:
URL: https://github.com/apache/flink/pull/12155#issuecomment-628702147


   Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress of the review.
   
   
   ## Automated Checks
   Last check on commit f9eeff7ed42ae72953e4d6a8067a61e677c8383d (Thu May 14 15:15:31 UTC 2020)
   
   **Warnings:**
    * No documentation files were touched! Remember to keep the Flink docs up to date!
    * **Invalid pull request title: No valid Jira ID provided**
   
   
   <sub>Mention the bot in a comment to re-run the automated checks.</sub>
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full explanation of the review process.<details>
    The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`)
    - `@flinkbot approve all` to approve all aspects
    - `@flinkbot approve-until architecture` to approve everything until `architecture`
    - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention
    - `@flinkbot disapprove architecture` to remove an approval you gave earlier
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #12155: [various] Cleanups and improvements to the SourceOperators

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #12155:
URL: https://github.com/apache/flink/pull/12155#issuecomment-628719416


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "9b264bba83656c10c6db6ce0b47f9f122bdc8e30",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1323",
       "triggerID" : "9b264bba83656c10c6db6ce0b47f9f122bdc8e30",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 9b264bba83656c10c6db6ce0b47f9f122bdc8e30 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1323) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #12155: [various] Cleanups and improvements to the SourceOperators

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #12155:
URL: https://github.com/apache/flink/pull/12155#issuecomment-628719416


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "9b264bba83656c10c6db6ce0b47f9f122bdc8e30",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1323",
       "triggerID" : "9b264bba83656c10c6db6ce0b47f9f122bdc8e30",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 9b264bba83656c10c6db6ce0b47f9f122bdc8e30 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1323) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] StephanEwen commented on pull request #12155: [various] Cleanups and improvements to the SourceOperators

Posted by GitBox <gi...@apache.org>.
StephanEwen commented on pull request #12155:
URL: https://github.com/apache/flink/pull/12155#issuecomment-628880625


   The test failure looks unrelated to these changes (Kafka 0.11 ExactlyOnceProducer Test)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #12155: [various] Cleanups and improvements to the SourceOperators

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #12155:
URL: https://github.com/apache/flink/pull/12155#issuecomment-628702147


   Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress of the review.
   
   
   ## Automated Checks
   Last check on commit 9b264bba83656c10c6db6ce0b47f9f122bdc8e30 (Fri Oct 16 10:33:16 UTC 2020)
   
   **Warnings:**
    * No documentation files were touched! Remember to keep the Flink docs up to date!
    * **Invalid pull request title: No valid Jira ID provided**
   
   
   <sub>Mention the bot in a comment to re-run the automated checks.</sub>
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full explanation of the review process.<details>
    The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`)
    - `@flinkbot approve all` to approve all aspects
    - `@flinkbot approve-until architecture` to approve everything until `architecture`
    - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention
    - `@flinkbot disapprove architecture` to remove an approval you gave earlier
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org