You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2020/01/15 11:18:57 UTC

[GitHub] [flink] xintongsong opened a new pull request #10863: [FLINK-15598][yarn] Memory accuracy loss in YarnClusterDescriptor may lead to deployment failure.

xintongsong opened a new pull request #10863: [FLINK-15598][yarn] Memory accuracy loss in YarnClusterDescriptor may lead to deployment failure.
URL: https://github.com/apache/flink/pull/10863
 
 
   ## What is the purpose of the change
   
   This PR removes  memory validations and overwrites in YarnClusterDescriptor, which is unnecessary and may cause failures.
   
   ## Brief change log
   
   - df5f8e44c335cdd2b6f8eb6974af28d7bb856ae7: Hotfix to remove unsed env keys
   - da1ec060cbef03b127f6f763e14e6ac5c7f82ff7: Hotfix to remove unsed numberTaskManagers from ClusterSpecification
   - d924dfce471036d6bcd7cca5d0548d9b78a08ed1: Remove memory validation from YarnClusterDescriptor
   - cd199a2a3b3dba0c38c38df5db11278feabb544d: Remove memory overwriting from YarnClusterDescriptor
   
   ## Verifying this change
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): (no)
     - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no)
     - The serializers: (no)
     - The runtime per-record code paths (performance sensitive): (no)
     - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes)
     - The S3 file system connector: (no)
   
   ## Documentation
   
     - Does this pull request introduce a new feature? (no)
     - If yes, how is the feature documented? (not applicable)
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot commented on issue #10863: [FLINK-15598][yarn] Memory accuracy loss in YarnClusterDescriptor may lead to deployment failure.

Posted by GitBox <gi...@apache.org>.
flinkbot commented on issue #10863: [FLINK-15598][yarn] Memory accuracy loss in YarnClusterDescriptor may lead to deployment failure.
URL: https://github.com/apache/flink/pull/10863#issuecomment-574631052
 
 
   <!--
   Meta data
   Hash:cd199a2a3b3dba0c38c38df5db11278feabb544d Status:UNKNOWN URL:TBD TriggerType:PUSH TriggerID:cd199a2a3b3dba0c38c38df5db11278feabb544d
   -->
   ## CI report:
   
   * cd199a2a3b3dba0c38c38df5db11278feabb544d UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot edited a comment on issue #10863: [FLINK-15598][yarn] Memory accuracy loss in YarnClusterDescriptor may lead to deployment failure.

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on issue #10863: [FLINK-15598][yarn] Memory accuracy loss in YarnClusterDescriptor may lead to deployment failure.
URL: https://github.com/apache/flink/pull/10863#issuecomment-574631052
 
 
   <!--
   Meta data
   Hash:cd199a2a3b3dba0c38c38df5db11278feabb544d Status:FAILURE URL:https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4367 TriggerType:PUSH TriggerID:cd199a2a3b3dba0c38c38df5db11278feabb544d
   Hash:cd199a2a3b3dba0c38c38df5db11278feabb544d Status:SUCCESS URL:https://travis-ci.com/flink-ci/flink/builds/144500744 TriggerType:PUSH TriggerID:cd199a2a3b3dba0c38c38df5db11278feabb544d
   Hash:02912689f5ae289d2098eb43ffad48809ad848fe Status:UNKNOWN URL:TBD TriggerType:PUSH TriggerID:02912689f5ae289d2098eb43ffad48809ad848fe
   -->
   ## CI report:
   
   * cd199a2a3b3dba0c38c38df5db11278feabb544d Travis: [SUCCESS](https://travis-ci.com/flink-ci/flink/builds/144500744) Azure: [FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4367) 
   * 02912689f5ae289d2098eb43ffad48809ad848fe UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot edited a comment on issue #10863: [FLINK-15598][yarn] Memory accuracy loss in YarnClusterDescriptor may lead to deployment failure.

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on issue #10863: [FLINK-15598][yarn] Memory accuracy loss in YarnClusterDescriptor may lead to deployment failure.
URL: https://github.com/apache/flink/pull/10863#issuecomment-574631052
 
 
   <!--
   Meta data
   Hash:cd199a2a3b3dba0c38c38df5db11278feabb544d Status:FAILURE URL:https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4367 TriggerType:PUSH TriggerID:cd199a2a3b3dba0c38c38df5db11278feabb544d
   Hash:cd199a2a3b3dba0c38c38df5db11278feabb544d Status:SUCCESS URL:https://travis-ci.com/flink-ci/flink/builds/144500744 TriggerType:PUSH TriggerID:cd199a2a3b3dba0c38c38df5db11278feabb544d
   Hash:02912689f5ae289d2098eb43ffad48809ad848fe Status:PENDING URL:https://travis-ci.com/flink-ci/flink/builds/144753064 TriggerType:PUSH TriggerID:02912689f5ae289d2098eb43ffad48809ad848fe
   Hash:02912689f5ae289d2098eb43ffad48809ad848fe Status:PENDING URL:https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4408 TriggerType:PUSH TriggerID:02912689f5ae289d2098eb43ffad48809ad848fe
   -->
   ## CI report:
   
   * cd199a2a3b3dba0c38c38df5db11278feabb544d Travis: [SUCCESS](https://travis-ci.com/flink-ci/flink/builds/144500744) Azure: [FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4367) 
   * 02912689f5ae289d2098eb43ffad48809ad848fe Travis: [PENDING](https://travis-ci.com/flink-ci/flink/builds/144753064) Azure: [PENDING](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4408) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot edited a comment on issue #10863: [FLINK-15598][yarn] Memory accuracy loss in YarnClusterDescriptor may lead to deployment failure.

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on issue #10863: [FLINK-15598][yarn] Memory accuracy loss in YarnClusterDescriptor may lead to deployment failure.
URL: https://github.com/apache/flink/pull/10863#issuecomment-574631052
 
 
   <!--
   Meta data
   Hash:cd199a2a3b3dba0c38c38df5db11278feabb544d Status:FAILURE URL:https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4367 TriggerType:PUSH TriggerID:cd199a2a3b3dba0c38c38df5db11278feabb544d
   Hash:cd199a2a3b3dba0c38c38df5db11278feabb544d Status:SUCCESS URL:https://travis-ci.com/flink-ci/flink/builds/144500744 TriggerType:PUSH TriggerID:cd199a2a3b3dba0c38c38df5db11278feabb544d
   -->
   ## CI report:
   
   * cd199a2a3b3dba0c38c38df5db11278feabb544d Travis: [SUCCESS](https://travis-ci.com/flink-ci/flink/builds/144500744) Azure: [FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4367) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] xintongsong commented on issue #10863: [FLINK-15598][yarn] Memory accuracy loss in YarnClusterDescriptor may lead to deployment failure.

Posted by GitBox <gi...@apache.org>.
xintongsong commented on issue #10863: [FLINK-15598][yarn] Memory accuracy loss in YarnClusterDescriptor may lead to deployment failure.
URL: https://github.com/apache/flink/pull/10863#issuecomment-575168659
 
 
   Thanks for the review @tillrohrmann.
   I've rebased this PR to the latest #10852 and addressed your comments.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] xintongsong commented on a change in pull request #10863: [FLINK-15598][yarn] Memory accuracy loss in YarnClusterDescriptor may lead to deployment failure.

Posted by GitBox <gi...@apache.org>.
xintongsong commented on a change in pull request #10863: [FLINK-15598][yarn] Memory accuracy loss in YarnClusterDescriptor may lead to deployment failure.
URL: https://github.com/apache/flink/pull/10863#discussion_r367435764
 
 

 ##########
 File path: flink-yarn/src/main/java/org/apache/flink/yarn/YarnClusterDescriptor.java
 ##########
 @@ -785,15 +783,6 @@ private ApplicationReport startAppMaster(
 		paths.add(remotePathJar);
 		classPathBuilder.append(flinkJarPath.getName()).append(File.pathSeparator);
 
-		// set the right configuration values for the TaskManager
-		configuration.setInteger(
-				TaskManagerOptions.NUM_TASK_SLOTS,
-				clusterSpecification.getSlotsPerTaskManager());
-
-		configuration.set(
-				TaskManagerOptions.TOTAL_PROCESS_MEMORY,
-				MemorySize.parse(clusterSpecification.getTaskManagerMemoryMB() + "m"));
 
 Review comment:
   Then I believe the following codes in `YarnClusterDescriptor#validateClusterResources` can also be removed.
   ```
   if (taskManagerMemoryMb < yarnMinAllocationMB) {
   	taskManagerMemoryMb =  yarnMinAllocationMB;
   }
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] tillrohrmann commented on a change in pull request #10863: [FLINK-15598][yarn] Memory accuracy loss in YarnClusterDescriptor may lead to deployment failure.

Posted by GitBox <gi...@apache.org>.
tillrohrmann commented on a change in pull request #10863: [FLINK-15598][yarn] Memory accuracy loss in YarnClusterDescriptor may lead to deployment failure.
URL: https://github.com/apache/flink/pull/10863#discussion_r367360336
 
 

 ##########
 File path: flink-yarn/src/main/java/org/apache/flink/yarn/YarnClusterDescriptor.java
 ##########
 @@ -785,15 +783,6 @@ private ApplicationReport startAppMaster(
 		paths.add(remotePathJar);
 		classPathBuilder.append(flinkJarPath.getName()).append(File.pathSeparator);
 
-		// set the right configuration values for the TaskManager
-		configuration.setInteger(
-				TaskManagerOptions.NUM_TASK_SLOTS,
-				clusterSpecification.getSlotsPerTaskManager());
-
-		configuration.set(
-				TaskManagerOptions.TOTAL_PROCESS_MEMORY,
-				MemorySize.parse(clusterSpecification.getTaskManagerMemoryMB() + "m"));
 
 Review comment:
   I think we could change `YarnClusterDecriptor#validateClusterResources` to return a `ClusterSpecification` with the `taskManagerMemoryMB` of the passed in `ClusterSpecification`.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] tillrohrmann closed pull request #10863: [FLINK-15598][yarn] Memory accuracy loss in YarnClusterDescriptor may lead to deployment failure.

Posted by GitBox <gi...@apache.org>.
tillrohrmann closed pull request #10863: [FLINK-15598][yarn] Memory accuracy loss in YarnClusterDescriptor may lead to deployment failure.
URL: https://github.com/apache/flink/pull/10863
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot edited a comment on issue #10863: [FLINK-15598][yarn] Memory accuracy loss in YarnClusterDescriptor may lead to deployment failure.

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on issue #10863: [FLINK-15598][yarn] Memory accuracy loss in YarnClusterDescriptor may lead to deployment failure.
URL: https://github.com/apache/flink/pull/10863#issuecomment-574631052
 
 
   <!--
   Meta data
   Hash:cd199a2a3b3dba0c38c38df5db11278feabb544d Status:FAILURE URL:https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4367 TriggerType:PUSH TriggerID:cd199a2a3b3dba0c38c38df5db11278feabb544d
   Hash:cd199a2a3b3dba0c38c38df5db11278feabb544d Status:PENDING URL:https://travis-ci.com/flink-ci/flink/builds/144500744 TriggerType:PUSH TriggerID:cd199a2a3b3dba0c38c38df5db11278feabb544d
   -->
   ## CI report:
   
   * cd199a2a3b3dba0c38c38df5db11278feabb544d Travis: [PENDING](https://travis-ci.com/flink-ci/flink/builds/144500744) Azure: [FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4367) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot commented on issue #10863: [FLINK-15598][yarn] Memory accuracy loss in YarnClusterDescriptor may lead to deployment failure.

Posted by GitBox <gi...@apache.org>.
flinkbot commented on issue #10863: [FLINK-15598][yarn] Memory accuracy loss in YarnClusterDescriptor may lead to deployment failure.
URL: https://github.com/apache/flink/pull/10863#issuecomment-574617107
 
 
   Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress of the review.
   
   
   ## Automated Checks
   Last check on commit cd199a2a3b3dba0c38c38df5db11278feabb544d (Wed Jan 15 11:22:10 UTC 2020)
   
   **Warnings:**
    * No documentation files were touched! Remember to keep the Flink docs up to date!
   
   
   <sub>Mention the bot in a comment to re-run the automated checks.</sub>
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full explanation of the review process.<details>
    The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`)
    - `@flinkbot approve all` to approve all aspects
    - `@flinkbot approve-until architecture` to approve everything until `architecture`
    - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention
    - `@flinkbot disapprove architecture` to remove an approval you gave earlier
   </details>

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot edited a comment on issue #10863: [FLINK-15598][yarn] Memory accuracy loss in YarnClusterDescriptor may lead to deployment failure.

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on issue #10863: [FLINK-15598][yarn] Memory accuracy loss in YarnClusterDescriptor may lead to deployment failure.
URL: https://github.com/apache/flink/pull/10863#issuecomment-574631052
 
 
   <!--
   Meta data
   Hash:cd199a2a3b3dba0c38c38df5db11278feabb544d Status:FAILURE URL:https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4367 TriggerType:PUSH TriggerID:cd199a2a3b3dba0c38c38df5db11278feabb544d
   Hash:cd199a2a3b3dba0c38c38df5db11278feabb544d Status:SUCCESS URL:https://travis-ci.com/flink-ci/flink/builds/144500744 TriggerType:PUSH TriggerID:cd199a2a3b3dba0c38c38df5db11278feabb544d
   Hash:02912689f5ae289d2098eb43ffad48809ad848fe Status:SUCCESS URL:https://travis-ci.com/flink-ci/flink/builds/144753064 TriggerType:PUSH TriggerID:02912689f5ae289d2098eb43ffad48809ad848fe
   Hash:02912689f5ae289d2098eb43ffad48809ad848fe Status:PENDING URL:https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4408 TriggerType:PUSH TriggerID:02912689f5ae289d2098eb43ffad48809ad848fe
   -->
   ## CI report:
   
   * cd199a2a3b3dba0c38c38df5db11278feabb544d Travis: [SUCCESS](https://travis-ci.com/flink-ci/flink/builds/144500744) Azure: [FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4367) 
   * 02912689f5ae289d2098eb43ffad48809ad848fe Travis: [SUCCESS](https://travis-ci.com/flink-ci/flink/builds/144753064) Azure: [PENDING](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4408) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services