You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tajo.apache.org by "Keuntae Park (JIRA)" <ji...@apache.org> on 2014/02/20 06:26:19 UTC
[jira] [Created] (TAJO-613) Hedging against unusually slow
TajoWorker
Keuntae Park created TAJO-613:
---------------------------------
Summary: Hedging against unusually slow TajoWorker
Key: TAJO-613
URL: https://issues.apache.org/jira/browse/TAJO-613
Project: Tajo
Issue Type: Improvement
Reporter: Keuntae Park
When one of disks in my Tajo cluster becomes not healthy (that means slow response time due to hardware problem), it results in extremely slow query processing time.
Following is kernel log of the server that has unhealthy disk:
{noformat}
Feb 18 15:20:12 ceo-tajo03 kernel: sd 0:2:4:0: [sde] Unhandled error code
Feb 18 15:20:12 ceo-tajo03 kernel: sd 0:2:4:0: [sde] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
Feb 18 15:20:12 ceo-tajo03 kernel: sd 0:2:4:0: [sde] CDB: Read(16): 88 00 00 00 00 01 57 ec 66 32 00 00 01 00 00 00
...
{noformat}
This problem makes TaskRunner, which normally takes less than 3 seconds for the given query, takes 1700 seconds, and total query execution time also becomes 1750 seconds, which is normally 70 seconds before.
I think Tajo needs a mechanism like speculative execution of MapReduce.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)