You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2018/08/20 01:43:20 UTC

[GitHub] gianm commented on issue #5709: Broker resiliency to misbehaving historical nodes

gianm commented on issue #5709: Broker resiliency to misbehaving historical nodes
URL: https://github.com/apache/incubator-druid/issues/5709#issuecomment-414175186
 
 
   Hi @peferron,
   
   That scope sounds useful for an initial patch. I think the biggest risk is that queries that are doomed to failure, possibly because of resource limits being exceeded, will get retried too much and double/triple the load on the cluster (depending on how many retries are allowed). Some suggestions to mitigate that:
   
   - Check the error code (if there is one) and don't retry on codes like RESOURCE_LIMIT_EXCEEDED, UNAUTHORIZED, or QUERY_TIMEOUT. (The latter one because, probably, the overall timeout of the query has passed by then anyway.)
   - Don't retry more than X subqueries per query.
   
   Another thing to think about is that it is possible for results to be partially retrieved (and partially processed) and then for the query to fail midway through. In this case, it's probably not possible to recover, since subquery results have already been mixed into the overall query results. The query may need to be retried from scratch.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org