You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2020/10/03 17:56:35 UTC

[GitHub] [flink] StephanEwen opened a new pull request #13538: [FLINK-19498][Connector Files] Port LocatableInputSplitAssigner to new File Source API

StephanEwen opened a new pull request #13538:
URL: https://github.com/apache/flink/pull/13538


   ## What is the purpose of the change
   
   This adds the `LocalityAwareSplitAssigner` for the new File Source API. The new assigner works the exact same way as the `LocatableInputSplitAssigner` from the *InputFormat API*.
   
   This new split assigner is also the default now for the `FileSource`.
   
   The code of the `LocalityAwareSplitAssigner` is largely a copy of the code from the `LocatableInputSplitAssigner`, adjusted for the different interface methods and cleaned up (code style / checkstyle).
   
   ## Verifying this change
   
   This copies and adjusts the tests from the `LocatableInputSplitAssigner`, except for the concurrency-related tests. Those are dropped, because the new FileSplitAssigners are not concurrent.
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): **no**
     - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: **yes, but only unreleased classes**
     - The serializers: **no**
     - The runtime per-record code paths (performance sensitive): **no**
     - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: **no**
     - The S3 file system connector: **no**
   
   ## Documentation
   
     - Does this pull request introduce a new feature? **no**
     - If yes, how is the feature documented? **not applicable**
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #13538: [FLINK-19498][Connector Files] Port LocatableInputSplitAssigner to new File Source API

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #13538:
URL: https://github.com/apache/flink/pull/13538#issuecomment-703142651


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "9f410a16f7ca507643207cda41edc08d93cbce19",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7183",
       "triggerID" : "9f410a16f7ca507643207cda41edc08d93cbce19",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 9f410a16f7ca507643207cda41edc08d93cbce19 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7183) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] StephanEwen edited a comment on pull request #13538: [FLINK-19498][Connector Files] Port LocatableInputSplitAssigner to new File Source API

Posted by GitBox <gi...@apache.org>.
StephanEwen edited a comment on pull request #13538:
URL: https://github.com/apache/flink/pull/13538#issuecomment-706104404


   ~I would then merge this PR and open a follow-up PR to make sure we have consistent hostnames.~
   
   Turns out this is a very simple addition that fits very well directly in the `LocalityAwareSplitAssigner`, so I'll just fix this on top of this code during merge.
   
   That also has the advantage that we don't drop FQDN information in the source interface in general, in case other sources (non file sources) would actually need them.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot commented on pull request #13538: [FLINK-19498][Connector Files] Port LocatableInputSplitAssigner to new File Source API

Posted by GitBox <gi...@apache.org>.
flinkbot commented on pull request #13538:
URL: https://github.com/apache/flink/pull/13538#issuecomment-703142651


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "9f410a16f7ca507643207cda41edc08d93cbce19",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "9f410a16f7ca507643207cda41edc08d93cbce19",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 9f410a16f7ca507643207cda41edc08d93cbce19 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #13538: [FLINK-19498][Connector Files] Port LocatableInputSplitAssigner to new File Source API

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #13538:
URL: https://github.com/apache/flink/pull/13538#issuecomment-703142651


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "9f410a16f7ca507643207cda41edc08d93cbce19",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7183",
       "triggerID" : "9f410a16f7ca507643207cda41edc08d93cbce19",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 9f410a16f7ca507643207cda41edc08d93cbce19 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7183) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] StephanEwen commented on pull request #13538: [FLINK-19498][Connector Files] Port LocatableInputSplitAssigner to new File Source API

Posted by GitBox <gi...@apache.org>.
StephanEwen commented on pull request #13538:
URL: https://github.com/apache/flink/pull/13538#issuecomment-706092592


   @JingsongLi True, that we should change in the `SourceReaderContext` - Use the `NetUtils` to get hostname from FQDN.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] asfgit closed pull request #13538: [FLINK-19498][Connector Files] Port LocatableInputSplitAssigner to new File Source API

Posted by GitBox <gi...@apache.org>.
asfgit closed pull request #13538:
URL: https://github.com/apache/flink/pull/13538


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot commented on pull request #13538: [FLINK-19498][Connector Files] Port LocatableInputSplitAssigner to new File Source API

Posted by GitBox <gi...@apache.org>.
flinkbot commented on pull request #13538:
URL: https://github.com/apache/flink/pull/13538#issuecomment-703141087


   Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress of the review.
   
   
   ## Automated Checks
   Last check on commit 5c40b1cdcf7ab70f89587d986198b5bae2c9ffe0 (Sat Oct 03 17:58:28 UTC 2020)
   
   **Warnings:**
    * No documentation files were touched! Remember to keep the Flink docs up to date!
   
   
   <sub>Mention the bot in a comment to re-run the automated checks.</sub>
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full explanation of the review process.<details>
    The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`)
    - `@flinkbot approve all` to approve all aspects
    - `@flinkbot approve-until architecture` to approve everything until `architecture`
    - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention
    - `@flinkbot disapprove architecture` to remove an approval you gave earlier
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] StephanEwen commented on pull request #13538: [FLINK-19498][Connector Files] Port LocatableInputSplitAssigner to new File Source API

Posted by GitBox <gi...@apache.org>.
StephanEwen commented on pull request #13538:
URL: https://github.com/apache/flink/pull/13538#issuecomment-706104404


   I would then merge this PR and open a follow-up PR to make sure we have consistent hostnames.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] StephanEwen commented on pull request #13538: [FLINK-19498][Connector Files] Port LocatableInputSplitAssigner to new File Source API

Posted by GitBox <gi...@apache.org>.
StephanEwen commented on pull request #13538:
URL: https://github.com/apache/flink/pull/13538#issuecomment-706092592






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org