You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2018/07/26 07:46:15 UTC

[GitHub] zhijiangW commented on issue #6103: [FLINK-9413] [distributed coordination] Tasks can fail with Partition…

zhijiangW commented on issue #6103: [FLINK-9413] [distributed coordination] Tasks can fail with Partition…
URL: https://github.com/apache/flink/pull/6103#issuecomment-408008057
 
 
   This exception is also ever caused in our large-scale applications, and we increase the `taskmanager.network.request-backoff.max` in cluster level to make it well.
   
   I agree with keeping the current default value in codes because it may delay the unit tests or itcases if increasing the value. Then the config can be adjusted based on job or cluster level if user meets this exception.
   
   Maybe we can register the task in network ASAP during running, for example, put `registerTask` in front of blob cache process, that may avoid the unnecessary failover in most cases. :)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services