You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Russell Jurney (JIRA)" <ji...@apache.org> on 2010/07/15 19:57:50 UTC

[jira] Commented: (HIVE-1107) Generic parallel execution framework for Hive (and Pig, and ...)

    [ https://issues.apache.org/jira/browse/HIVE-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888870#action_12888870 ] 

Russell Jurney commented on HIVE-1107:
--------------------------------------

At Jeff's suggestion, my comments on this ticket for Hive and Pig follow.

Oozie has been suggested as a solution to this ticket, but it is in my opinion far too complex to be appropriate for Pig or HIVE.  A scheduler should not be more complex than the language it schedules, and Oozie is more complex than Pig and HIVE put together.  Compare their manuals, both in terms of length and readability.  Furthermore, Oozie is (nearly?) turing complete XML, not easily human readable script, and scheduling one job takes far too much of it.

Pig and HIVE aim to deliver simplicity and accessibility.  In time Oozie may mature, but it is not there yet.  The features are present, but the open source interface is extremely raw.  The only simple interface to Oozie is a proprietary GUI.  Perhaps the next major release will be an improvement.

A tight binding between these projects would cause LinkedIn problems, as we use Azkaban to schedule pig jobs.  Scheduling a job in Azkaban consists of creating a zip file of your job's content, inserting a very brief config (typically 3-6 lines), and issuing a one-line command.  The web interface to Azkaban is free.  This makes it a more appropriate choice for this ticket than Oozie, but making Azkaban tightly bound to Pig would be a terrible idea too.

We should be very careful about adding enterprise baggage to these tools that is simply not needed for the vast majority of users.  Convention over configuration is at the core of Pig and HIVE.  Lets not spoil that.

> Generic parallel execution framework for Hive (and Pig, and ...)
> ----------------------------------------------------------------
>
>                 Key: HIVE-1107
>                 URL: https://issues.apache.org/jira/browse/HIVE-1107
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: Carl Steinbach
>
> Pig and Hive each have their own libraries for handling plan execution. As we prepare to invest more time improving Hive's plan execution mechanism we should also start to consider ways of building a generic plan execution mechanism that is capable of supporting the needs of Hive and Pig, as well as other Hadoop data flow programming environments. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.