You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Chaitanya Mishra (JIRA)" <ji...@apache.org> on 2009/11/01 05:14:59 UTC

[jira] Updated: (HIVE-549) Parallel Execution Mechanism

     [ https://issues.apache.org/jira/browse/HIVE-549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chaitanya Mishra updated HIVE-549:
----------------------------------

    Attachment: HIVE-549-v3.patch

There was a race condition in the previous patch of the following form.

Task 3 is a child of Tasks 1 and 2.
1 and 2 finish.
Main thread checks child of 1 (Task 3) if it has started and is runnable. Calls initalize for task 3. Launches new thread for task 3.
Main thread checks child of 2 (also task 3) if it has started and is runnable. By this point the separate thread for 3 has not actually done any execution. Hence started evaluates to false, and main therefore launches a new instance of Task 3.

Problem: There are 2 instances of Task 3.

Solution: Added a new variable initialized to task.java. initialized is set to true as soon as the function initialize() is called. Similarly, launch a task only if it isRunnable() and is not initialized. Since only the main thread invokes tsk.intiialize() and checkLaunch(), there is no race condition here.

Uploading this patch, and deleting previous two patches.

> Parallel Execution Mechanism
> ----------------------------
>
>                 Key: HIVE-549
>                 URL: https://issues.apache.org/jira/browse/HIVE-549
>             Project: Hadoop Hive
>          Issue Type: Wish
>          Components: Query Processor
>            Reporter: Adam Kramer
>            Assignee: Chaitanya Mishra
>         Attachments: HIVE-549-v3.patch
>
>
> In a massively parallel database system, it would be awesome to also parallelize some of the mapreduce phases that our data needs to go through.
> One example that just occurred to me is UNION ALL: when you union two SELECT statements, effectively you could run those statements in parallel. There's no situation (that I can think of, but I don't have a formal proof) in which the left statement would rely on the right statement, or vice versa. So, they could be run at the same time...and perhaps they should be. Or, perhaps there should be a way to make this happen...PARALLEL UNION ALL? PUNION ALL?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.