You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Wenzhe Zhou (Jira)" <ji...@apache.org> on 2020/10/20 23:22:00 UTC

[jira] [Commented] (IMPALA-10270) Failed Task Blacklisting

    [ https://issues.apache.org/jira/browse/IMPALA-10270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17217996#comment-17217996 ] 

Wenzhe Zhou commented on IMPALA-10270:
--------------------------------------

Query failures could be caused by file/disk sector corruption, RPC timeout, network partition, over resource limits like memory, file handler, thread pool, etc. Some of causes are temporally, and could be relieved soon. We should classify the errors to set different timeout values for blacklisting.

> Failed Task Blacklisting
> ------------------------
>
>                 Key: IMPALA-10270
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10270
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Backend
>            Reporter: Wenzhe Zhou
>            Assignee: Wenzhe Zhou
>            Priority: Major
>
> * If node “a” has a higher rate of task failures compared to the rest of the cluster, then node “a” should be blacklisted
>  * There should only be a specific set of failures that count against a node - e.g. query specific failures like reading corrupted files, or mem limit exceeded should not count
>  * This is similar to how Spark Executor Blacklisting works



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org