You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tajo.apache.org by "Keuntae Park (JIRA)" <ji...@apache.org> on 2014/02/20 06:26:19 UTC

[jira] [Created] (TAJO-613) Hedging against unusually slow TajoWorker

Keuntae Park created TAJO-613:
---------------------------------

             Summary: Hedging against unusually slow TajoWorker
                 Key: TAJO-613
                 URL: https://issues.apache.org/jira/browse/TAJO-613
             Project: Tajo
          Issue Type: Improvement
            Reporter: Keuntae Park


When one of disks in my Tajo cluster becomes not healthy (that means slow response time due to hardware problem), it results in extremely slow query processing time.

Following is kernel log of the server that has unhealthy disk:
{noformat}
Feb 18 15:20:12 ceo-tajo03 kernel: sd 0:2:4:0: [sde] Unhandled error code
Feb 18 15:20:12 ceo-tajo03 kernel: sd 0:2:4:0: [sde] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
Feb 18 15:20:12 ceo-tajo03 kernel: sd 0:2:4:0: [sde] CDB: Read(16): 88 00 00 00 00 01 57 ec 66 32 00 00 01 00 00 00
...
{noformat}

This problem makes TaskRunner, which normally takes less than 3 seconds for the given query,  takes 1700 seconds, and total query execution time also becomes 1750 seconds, which is normally 70 seconds before.    

I think Tajo needs a mechanism like speculative execution of MapReduce.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)