You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tajo.apache.org by "Hyunsik Choi (JIRA)" <ji...@apache.org> on 2014/02/28 15:11:22 UTC

[jira] [Commented] (TAJO-613) Hedging against unusually slow TajoWorker

    [ https://issues.apache.org/jira/browse/TAJO-613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13915810#comment-13915810 ] 

Hyunsik Choi commented on TAJO-613:
-----------------------------------

+1 for this issue. We absolutely need the way to handle stragglers. Fortunately, TAJO-589 is in progress. It enables QueryMaster to track the progresses of tasks. The feature of TAJO-589 allows QueryMaster to detect unexpected slowness of tasks which may occur in large clusters. I believe that we can do straggler handling after TAJO-589.

> Hedging against unusually slow TajoWorker
> -----------------------------------------
>
>                 Key: TAJO-613
>                 URL: https://issues.apache.org/jira/browse/TAJO-613
>             Project: Tajo
>          Issue Type: Improvement
>            Reporter: Keuntae Park
>
> When one of disks in my Tajo cluster becomes not healthy (that means slow response time due to hardware problem), it results in extremely slow query processing time.
> Following is kernel log of the server that has unhealthy disk:
> {noformat}
> Feb 18 15:20:12 ceo-tajo03 kernel: sd 0:2:4:0: [sde] Unhandled error code
> Feb 18 15:20:12 ceo-tajo03 kernel: sd 0:2:4:0: [sde] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
> Feb 18 15:20:12 ceo-tajo03 kernel: sd 0:2:4:0: [sde] CDB: Read(16): 88 00 00 00 00 01 57 ec 66 32 00 00 01 00 00 00
> ...
> {noformat}
> This problem makes TaskRunner, which normally takes less than 3 seconds for the given query,  takes 1700 seconds, and total query execution time also becomes 1750 seconds, which is normally 70 seconds before.    
> I think Tajo needs a mechanism like speculative execution of MapReduce.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)