You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/10/04 09:34:00 UTC
[jira] [Commented] (FLINK-10319) Too many requestPartitionState would crash JM

    [ https://issues.apache.org/jira/browse/FLINK-10319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16637986#comment-16637986 ] 

ASF GitHub Bot commented on FLINK-10319:
----------------------------------------

tillrohrmann commented on issue #6680: [FLINK-10319] [runtime] Too many requestPartitionState would crash JM
URL: https://github.com/apache/flink/pull/6680#issuecomment-426950862
 
 
   Thanks for opening this PR @TisonKun. 
   
   Before diving into the details of this PR I'd like to know whether you've observed that the JM crashes or is this more of theoretical nature? If it does crash indeed, then I would be interested to learn why, because the `requestPartitionState` method should not be blocking at all. How many `requestPartitionState` messages are in generated in the crash case?
   
   Another question is concerning your assumptions: You said that `retriggerPartitionRequest` would fail if the producer is gone. With producer do you mean the producing `Task` or the `TaskManager`. In the former case, I think the remote `TaskManager` would simply respond with a `PartitionNotFoundException` which retriggers the same partition request method again. Thus, I'm not quite sure whether the consumer task would actually fail or simply retry infinitely. The latter result is imo what we try to prevent with asking the JM about the state of the result partition.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Too many requestPartitionState would crash JM
> ---------------------------------------------
>
>                 Key: FLINK-10319
>                 URL: https://issues.apache.org/jira/browse/FLINK-10319
>             Project: Flink
>          Issue Type: Improvement
>          Components: Distributed Coordination
>    Affects Versions: 1.7.0
>            Reporter: tison
>            Assignee: tison
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.7.0
>
>
> Do not requestPartitionState from JM on partition request fail, which may generate too many RPC requests and block JM.
> We gain little benefit to check what state producer is in, which in the other hand crash JM by too many RPC requests. Task could always retriggerPartitionRequest from its InputGate, it would be fail if the producer has gone and succeed if the producer alive. Anyway, no need to ask for JM for help.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)