You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Eugene Koifman (JIRA)" <ji...@apache.org> on 2018/03/16 20:46:24 UTC

[jira] [Updated] (HIVE-18978) ConditionalTask.addDependentTask(Task t) adds t in the wrong place

     [ https://issues.apache.org/jira/browse/HIVE-18978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eugene Koifman updated HIVE-18978:
----------------------------------
    Description: 
{\{ConditionalTask.addDependentTask(Task t) }} is implemented like this:
{noformat}
/**
* Add a dependent task on the current conditional task. The task will not be a direct child of
* conditional task. Actually it will be added as child task of associated tasks.
*
* @return true if the task got added false if it already existed
*/
@Override
public boolean addDependentTask(Task<? extends Serializable> dependent) {
  boolean ret = false;
  if (getListTasks() != null) {
    ret = true;
    for (Task<? extends Serializable> tsk : getListTasks()) {
      ret = ret & tsk.addDependentTask(dependent);
    }
  }
  return ret;
}
{noformat}
So let’s say, the tasks in the ConditionalTask are A,B,C, but they have children.
{noformat}
    CondTask
      |--A
         |--A1
            |-A2
      |--B
         |--B1
      |--C
        |--C1
{noformat}
The way ConditionalTask.addDependent() is implemented, MyTask becomes a sibling of A1,
 B1 and C1. So even if only 1 branch of ConditionalTask is executed (and parallel task
 execution is enabled), there is no guarantee (as I see) that MyTask runs after A2 or
 B1 or C1, which is really what is needed.

 

Once this is done add a .q file test that records a plan for Export from Acid: HIVE-18978

  was:
{{ConditionalTask.addDependentTask(Task t) }} is implemented like this:
{noformat}
/**
* Add a dependent task on the current conditional task. The task will not be a direct child of
* conditional task. Actually it will be added as child task of associated tasks.
*
* @return true if the task got added false if it already existed
*/
@Override
public boolean addDependentTask(Task<? extends Serializable> dependent) {
  boolean ret = false;
  if (getListTasks() != null) {
    ret = true;
    for (Task<? extends Serializable> tsk : getListTasks()) {
      ret = ret & tsk.addDependentTask(dependent);
    }
  }
  return ret;
}
{noformat}


     So let’s say, the tasks in the ConditionalTask are A,B,C, but they have children.
{noformat}
    CondTask
      |--A
         |--A1
            |-A2
      |--B
         |--B1
      |--C
        |--C1
{noformat}

     The way ConditionalTask.addDependent() is implemented, MyTask becomes a sibling of A1,
     B1 and C1.  So even if only 1 branch of ConditionalTask is executed (and parallel task
     execution is enabled), there is no guarantee (as I see) that MyTask runs after A2 or
     B1 or C1, which is really what is needed.



> ConditionalTask.addDependentTask(Task t) adds t in the wrong place
> ------------------------------------------------------------------
>
>                 Key: HIVE-18978
>                 URL: https://issues.apache.org/jira/browse/HIVE-18978
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Eugene Koifman
>            Assignee: Eugene Koifman
>            Priority: Major
>
> {\{ConditionalTask.addDependentTask(Task t) }} is implemented like this:
> {noformat}
> /**
> * Add a dependent task on the current conditional task. The task will not be a direct child of
> * conditional task. Actually it will be added as child task of associated tasks.
> *
> * @return true if the task got added false if it already existed
> */
> @Override
> public boolean addDependentTask(Task<? extends Serializable> dependent) {
>   boolean ret = false;
>   if (getListTasks() != null) {
>     ret = true;
>     for (Task<? extends Serializable> tsk : getListTasks()) {
>       ret = ret & tsk.addDependentTask(dependent);
>     }
>   }
>   return ret;
> }
> {noformat}
> So let’s say, the tasks in the ConditionalTask are A,B,C, but they have children.
> {noformat}
>     CondTask
>       |--A
>          |--A1
>             |-A2
>       |--B
>          |--B1
>       |--C
>         |--C1
> {noformat}
> The way ConditionalTask.addDependent() is implemented, MyTask becomes a sibling of A1,
>  B1 and C1. So even if only 1 branch of ConditionalTask is executed (and parallel task
>  execution is enabled), there is no guarantee (as I see) that MyTask runs after A2 or
>  B1 or C1, which is really what is needed.
>  
> Once this is done add a .q file test that records a plan for Export from Acid: HIVE-18978



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)