You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2019/06/28 11:00:59 UTC

[GitHub] [flink] gaoyunhaii opened a new pull request #8925: [FLINK-12852][network] Fix the deadlock occured when requesting exclusive buffers

gaoyunhaii opened a new pull request #8925: [FLINK-12852][network] Fix the deadlock occured when requesting exclusive buffers
URL: https://github.com/apache/flink/pull/8925
 
 
   ## What is the purpose of the change
   
   This pull request tries to fix the deadlock problem during requesting exclusive buffers. Since currently the number of maximum buffers and the number of required buffers are not the same for local buffer pools, there may be cases that the local buffer pools of the upstream tasks occupy all the buffers while the downstream tasks fail to acquire exclusive buffers to make progress. Although this problem can be fixed by increasing the number of total buffers, the deadlock may not be acceptable. Therefore, this PR tries to failover the current execution when the deadlock occurs and tips users to increase the number of buffers in the exceptional message.
   
   ## Brief change log
   
   The main changes include
   
   1. Add an option for the timeout of `requestMemorySegment` for each channel. The default timeout is 30s. This option is marked as undocumented since it may be removed within the future implementation.
   2. Transfer the timeout to `NetworkBufferPool`.
   3. `requestMemorySegments` will throw `IOException("Insufficient buffer")`  if not all segments acquired after timeout.
   
   ## Verifying this change
   
   1. Added test that validates `requestMemorySegments` end exceptionally if not all segments acquired after timeout.
   
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): **no**
     - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: **no**
     - The serializers: **no**
     - The runtime per-record code paths (performance sensitive): **no**
     - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: **no**
     - The S3 file system connector: **no**
   
   ## Documentation
   
     - Does this pull request introduce a new feature? **no**
     - If yes, how is the feature documented? **not applicable**
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services