You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2020/01/24 15:20:00 UTC

[GitHub] [flink] azagrebin opened a new pull request #10940: [FLINK-14894][core][mem] Do not explicitly release unsafe memory when managed segment is freed

azagrebin opened a new pull request #10940: [FLINK-14894][core][mem] Do not explicitly release unsafe memory when managed segment is freed
URL: https://github.com/apache/flink/pull/10940
 
 
   ## What is the purpose of the change
   
   The conclusion at the moment is that release unsafe memory, while potentially having link on it in Java code, is dangerous. We revert this to rely only on GC when there are no links in Java code. The problem can happen e.g. if task thread exits w/o joining with IO threads (e.g. spilling in batch job) then the unsafe memory is released but it can be written w/o segfault by IO thread. At the same time, other task can allocate interleaving memory which can be spoiled by that IO thread. We still keep it unsafe to allocate it outside of JVM direct memory limit to not interfere with direct allocations, also it does not make sense for RocksDB native memory (also accounted in MemoryManager) to be part of direct memory limit.
   
   The potential downside can be that over-allocating of unsafe memory will not hit the direct limit and will not cause GC immediately which will be the only way to release it. In this case, it can cause out-of-memory failures w/o triggering GC to release a lot of potentially already unused memory.
   
   If we see the delayed release as a problem then we can investigate further optimisations, like:
   - directly monitoring phantom reference queue of the cleaner (if JVM detects quickly that there are no more reference to the memory) and explicitly release memory ready for GC asap, e.g. after Task exit
   - monitor allocated memory amount and block allocation until GC releases occupied memory instead of failing with out-of-memory immediately
   
   ## Brief change log
   
   remove `cleaner` from `HybridMemorySegment` and `cleaner#run` from `HybridMemorySegment#free`
   
   ## Verifying this change
   
   existing tests
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): (no)
     - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no)
     - The serializers: (no)
     - The runtime per-record code paths (performance sensitive): (can be)
     - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
     - The S3 file system connector: (no)
   
   ## Documentation
   
     - Does this pull request introduce a new feature? (no)
     - If yes, how is the feature documented? (not applicable)
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot edited a comment on issue #10940: [FLINK-14894][core][mem] Do not explicitly release unsafe memory when managed segment is freed

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on issue #10940: [FLINK-14894][core][mem] Do not explicitly release unsafe memory when managed segment is freed
URL: https://github.com/apache/flink/pull/10940#issuecomment-578182575
 
 
   <!--
   Meta data
   Hash:b166a5b0cd6f05d0176d996bc32328438431aa01 Status:SUCCESS URL:https://travis-ci.com/flink-ci/flink/builds/145957644 TriggerType:PUSH TriggerID:b166a5b0cd6f05d0176d996bc32328438431aa01
   Hash:b166a5b0cd6f05d0176d996bc32328438431aa01 Status:SUCCESS URL:https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4607 TriggerType:PUSH TriggerID:b166a5b0cd6f05d0176d996bc32328438431aa01
   -->
   ## CI report:
   
   * b166a5b0cd6f05d0176d996bc32328438431aa01 Travis: [SUCCESS](https://travis-ci.com/flink-ci/flink/builds/145957644) Azure: [SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4607) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] azagrebin commented on issue #10940: [FLINK-14894][core][mem] Do not explicitly release unsafe memory when managed segment is freed

Posted by GitBox <gi...@apache.org>.
azagrebin commented on issue #10940: [FLINK-14894][core][mem] Do not explicitly release unsafe memory when managed segment is freed
URL: https://github.com/apache/flink/pull/10940#issuecomment-578175052
 
 
   [flink CI build](https://travis-ci.com/flink-ci/flink/builds/145952099)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot edited a comment on issue #10940: [FLINK-14894][core][mem] Do not explicitly release unsafe memory when managed segment is freed

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on issue #10940: [FLINK-14894][core][mem] Do not explicitly release unsafe memory when managed segment is freed
URL: https://github.com/apache/flink/pull/10940#issuecomment-578182575
 
 
   <!--
   Meta data
   Hash:b166a5b0cd6f05d0176d996bc32328438431aa01 Status:PENDING URL:https://travis-ci.com/flink-ci/flink/builds/145957644 TriggerType:PUSH TriggerID:b166a5b0cd6f05d0176d996bc32328438431aa01
   Hash:b166a5b0cd6f05d0176d996bc32328438431aa01 Status:PENDING URL:https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4607 TriggerType:PUSH TriggerID:b166a5b0cd6f05d0176d996bc32328438431aa01
   -->
   ## CI report:
   
   * b166a5b0cd6f05d0176d996bc32328438431aa01 Travis: [PENDING](https://travis-ci.com/flink-ci/flink/builds/145957644) Azure: [PENDING](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4607) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot commented on issue #10940: [FLINK-14894][core][mem] Do not explicitly release unsafe memory when managed segment is freed

Posted by GitBox <gi...@apache.org>.
flinkbot commented on issue #10940: [FLINK-14894][core][mem] Do not explicitly release unsafe memory when managed segment is freed
URL: https://github.com/apache/flink/pull/10940#issuecomment-578182575
 
 
   <!--
   Meta data
   Hash:b166a5b0cd6f05d0176d996bc32328438431aa01 Status:UNKNOWN URL:TBD TriggerType:PUSH TriggerID:b166a5b0cd6f05d0176d996bc32328438431aa01
   -->
   ## CI report:
   
   * b166a5b0cd6f05d0176d996bc32328438431aa01 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] tillrohrmann closed pull request #10940: [FLINK-14894][core][mem] Do not explicitly release unsafe memory when managed segment is freed

Posted by GitBox <gi...@apache.org>.
tillrohrmann closed pull request #10940: [FLINK-14894][core][mem] Do not explicitly release unsafe memory when managed segment is freed
URL: https://github.com/apache/flink/pull/10940
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot commented on issue #10940: [FLINK-14894][core][mem] Do not explicitly release unsafe memory when managed segment is freed

Posted by GitBox <gi...@apache.org>.
flinkbot commented on issue #10940: [FLINK-14894][core][mem] Do not explicitly release unsafe memory when managed segment is freed
URL: https://github.com/apache/flink/pull/10940#issuecomment-578174786
 
 
   Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress of the review.
   
   
   ## Automated Checks
   Last check on commit 89691a86b282574f41f81d52d81da1181df68690 (Fri Jan 24 15:22:21 UTC 2020)
   
   **Warnings:**
    * No documentation files were touched! Remember to keep the Flink docs up to date!
   
   
   <sub>Mention the bot in a comment to re-run the automated checks.</sub>
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full explanation of the review process.<details>
    The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`)
    - `@flinkbot approve all` to approve all aspects
    - `@flinkbot approve-until architecture` to approve everything until `architecture`
    - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention
    - `@flinkbot disapprove architecture` to remove an approval you gave earlier
   </details>

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot edited a comment on issue #10940: [FLINK-14894][core][mem] Do not explicitly release unsafe memory when managed segment is freed

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on issue #10940: [FLINK-14894][core][mem] Do not explicitly release unsafe memory when managed segment is freed
URL: https://github.com/apache/flink/pull/10940#issuecomment-578182575
 
 
   <!--
   Meta data
   Hash:b166a5b0cd6f05d0176d996bc32328438431aa01 Status:SUCCESS URL:https://travis-ci.com/flink-ci/flink/builds/145957644 TriggerType:PUSH TriggerID:b166a5b0cd6f05d0176d996bc32328438431aa01
   Hash:b166a5b0cd6f05d0176d996bc32328438431aa01 Status:PENDING URL:https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4607 TriggerType:PUSH TriggerID:b166a5b0cd6f05d0176d996bc32328438431aa01
   -->
   ## CI report:
   
   * b166a5b0cd6f05d0176d996bc32328438431aa01 Travis: [SUCCESS](https://travis-ci.com/flink-ci/flink/builds/145957644) Azure: [PENDING](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4607) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services