You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by shuai-xu <gi...@git.apache.org> on 2018/07/18 08:05:27 UTC

[GitHub] flink pull request #6360: [FLINK-9884] [runtime] fix slot request may not be...

GitHub user shuai-xu opened a pull request:

    https://github.com/apache/flink/pull/6360

    [FLINK-9884] [runtime] fix slot request may not be removed when it has already be assigned in slot manager

    
    ## What is the purpose of the change
    
    *(The pull request fix the bug that slot request may not be removed from pendingSlotRequests in slot manager when it has been assigned.)*
    
    
    ## Verifying this change
    
    This change added tests and can be verified as follows:
    
      - *Added test in SlotManagerTest.
    
    ## Does this pull request potentially affect one of the following parts:
    
      - Dependencies (does it add or upgrade a dependency): (no)
      - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no)
      - The serializers: (no)
      - The runtime per-record code paths (performance sensitive): (no)
      - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
      - The S3 file system connector: (no)
    
    ## Documentation
    
      - Does this pull request introduce a new feature? (no)
      - If yes, how is the feature documented? (not applicable)


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/shuai-xu/flink jira-9884

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/6360.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #6360
    
----
commit 58e24424401d28647e376a9ee32d7b70d9ca2724
Author: shuai-xu <sh...@...>
Date:   2018-07-18T07:54:55Z

    [FLINK-9884] [runtime] fix slot request may not be removed when it has already be assigned in slot manager

commit 4d53107a2817e0e3def8ed31926a7b4a97251c1c
Author: shuai-xu <sh...@...>
Date:   2018-07-18T07:56:50Z

    adjust the import order

----


---

[GitHub] flink issue #6360: [FLINK-9884] [runtime] fix slot request may not be remove...

Posted by tison1 <gi...@git.apache.org>.
Github user tison1 commented on the issue:

    https://github.com/apache/flink/pull/6360
  
    @shuai-xu 
    It makes sense.
    The message that TM has successfully allocated slot might lost in transport.
    When slot manager receives a slot status report which says one slot has allocation id irrelevant to this offer, then the slot is allocated to another slot request.
    It looks this PR prevents runtime from some potential resource leak, doesn't it?


---

[GitHub] flink issue #6360: [FLINK-9884] [runtime] fix slot request may not be remove...

Posted by tison1 <gi...@git.apache.org>.
Github user tison1 commented on the issue:

    https://github.com/apache/flink/pull/6360
  
    > When task executor report a slotA with allocationId1, it may happen that slot manager record slotA is assigned to allocationId2, and the slot request with allocationId1 is not assigned. Then slot manager will update itself with slotA assigned to allocationId1, by it does not clear the slot request with allocationId1.
    >
    > For example:
    >  \# There is one free slot in slot manager.
    >  \# Now come two slot request with allocationId1 and allocationId2.
    >  \# The slot is assigned to allocationId1, but the requestSlot call timeout.
    >  \# SlotManager assign the slot to allocationId2 and insert a slot request with allocationId1.
    >  \# The second requestSlot call to task executor return SlotOccupiedException.
    >  \# SlotManager update the slot to allocationID1, but the slot request is left.
    
    pick from the assigned JIRA for further discuss


---