You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hive.apache.org by "Namit Jain (JIRA)" <ji...@apache.org> on 2011/03/21 19:49:05 UTC

[jira] [Commented] (HIVE-2026) Parallelize UpdateInputAccessTimeHook

    [ https://issues.apache.org/jira/browse/HIVE-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13009281#comment-13009281 ] 

Namit Jain commented on HIVE-2026:
----------------------------------

Ning, do you want the new parameter to use the new configuration variable - I mean, number of threads.
I mean, ExecDriver can invoke the hooks in parallel and then wait for them, that way if we have
a new hook with similar requirements in the future, we dont have ti duplicate this code.

> Parallelize UpdateInputAccessTimeHook
> -------------------------------------
>
>                 Key: HIVE-2026
>                 URL: https://issues.apache.org/jira/browse/HIVE-2026
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>         Attachments: HIVE-2026.patch, HIVE-2026_2.patch
>
>
> UpdateInputAccessTimeHook is usually used as a pre-execution hook to update the metastore's lastAccessTime field of input partition/table. If a query touches a large number of partitions, this hooks takes a long time to execute. One approach is to make the hook itself to run in a separate thread. But it is hard to guarantee backward compatibility in semantics in case of exceptions encountered in the hook execution. This task takes another approach to parallelize the hook itself (update multiple partitions concurrently), but execute each pre-hook in sequential order. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira