You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@tajo.apache.org by "Jihoon Son (JIRA)" <ji...@apache.org> on 2014/01/03 06:52:52 UTC

[jira] [Created] (TAJO-473) Improve the fault tolerance of LazyTaskScheduler

Jihoon Son created TAJO-473:
-------------------------------

             Summary: Improve the fault tolerance of LazyTaskScheduler 
                 Key: TAJO-473
                 URL: https://issues.apache.org/jira/browse/TAJO-473
             Project: Tajo
          Issue Type: New Feature
          Components: query master
    Affects Versions: 0.2-incubating
            Reporter: Jihoon Son


As discussed in TAJO-385 and https://reviews.apache.org/r/16455/, the LazyTaskScheduler has a problem when tasks are failed.
When a failed task of multiple fragments is re-assigned to a node, the locality of fragments is extremely hard to preserved because it is nearly impossible that every fragments is stored at two or more common hosts.

A simple and good solution is that creating multiple query unit attempts for each fragments when a failed task is reattempted. To implement this approach, we should maintain the information of the query processing attempt for each fragment, not for each query unit.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)