You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2018/10/04 09:34:33 UTC

[GitHub] tillrohrmann edited a comment on issue #6680: [FLINK-10319] [runtime] Too many requestPartitionState would crash JM

tillrohrmann edited a comment on issue #6680: [FLINK-10319] [runtime] Too many requestPartitionState would crash JM
URL: https://github.com/apache/flink/pull/6680#issuecomment-426950862

Thanks for opening this PR @TisonKun.

Before diving into the details of this PR I'd like to know whether you've observed that the JM crashes or is this more of theoretical nature? If it does crash indeed, then I would be interested to learn why, because the `requestPartitionState` method should not be blocking at all. How many `requestPartitionState` messages are in generated in the crash case?

Another question is concerning your assumptions: You said that `retriggerPartitionRequest` would fail if the producer is gone. With producer do you mean the producing `Task` or the `TaskManager`. In the former case, I think the remote `TaskManager` would simply respond with a `PartitionNotFoundException` which retriggers the same partition request method again. Thus, I'm not quite sure whether the consumer task would actually fail or simply retry infinitely. The latter result is imo what we try to prevent with asking the JM about the state of the result partition.

I would like to hear @uce opinion on this as well, because he used to work on this part of the code in the past.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

With regards,
Apache Git Services