You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@celeborn.apache.org by "waitinfuture (via GitHub)" <gi...@apache.org> on 2023/01/30 09:31:29 UTC

[GitHub] [incubator-celeborn] waitinfuture commented on pull request #1180: [CELEBORN-238][IMPROVEMENT] Push data to master partition timeout should add to blacklist and reserve for some time

waitinfuture commented on PR #1180:
URL: https://github.com/apache/incubator-celeborn/pull/1180#issuecomment-1408263069

   currently we have no evidence whether pushdata timeout is caused by main or slave. I think we can refine the design that ```celeborn.push.data.timeout``` means **ONE WAY** of pushdata's timeout, which means that if replication is on, the timeout in client should be ```2 * celeborn.push.data.timeout```, and the timeout of the replication should be ```celeborn.push.data.timeout```. If replicate timeouts, we can pass the message back to client and we should blacklist slave, otherwise we should blacklist main.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@celeborn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org