You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Zheng Shao (JIRA)" <ji...@apache.org> on 2009/08/25 00:42:59 UTC

[jira] Created: (HIVE-788) Triggers when a new partition is created for a table

Triggers when a new partition is created for a table
----------------------------------------------------

                 Key: HIVE-788
                 URL: https://issues.apache.org/jira/browse/HIVE-788
             Project: Hadoop Hive
          Issue Type: New Feature
            Reporter: Zheng Shao


One requirement for HIVE-787 is that users would like to run a command whenever a new partition of a Hive table gets created.

There are several ways to achieve this functionality:

A. Probe and wait: We can have the scripts running in a loop checking if a new partition is created.
  Pros: easy to write, easy to control
  Cons: will introduce another delay based on the probing interval.

B. Triggered: The command is registered inside the hive metastore. Whenever a partition gets created, we run the registered command. 

Several questions around option B are:
1. whether to support registration of HiveQL or shell command;
2. which machine/environment to run the command;
3. what to do if the registered command failed.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-788) Triggers when a new partition is created for a table

Posted by "Edward Capriolo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12747679#action_12747679 ] 

Edward Capriolo commented on HIVE-788:
--------------------------------------

>>what are the types of trigger we want to support now

I have an idea. HiveHistory already has events in an object and textual form. Would the events above  appear in a HiveHistory message. If so, fire each HiveHistory event to external script?




> Triggers when a new partition is created for a table
> ----------------------------------------------------
>
>                 Key: HIVE-788
>                 URL: https://issues.apache.org/jira/browse/HIVE-788
>             Project: Hadoop Hive
>          Issue Type: New Feature
>            Reporter: Zheng Shao
>
> One requirement for HIVE-787 is that users would like to run a command whenever a new partition of a Hive table gets created.
> There are several ways to achieve this functionality:
> A. Probe and wait: We can have the scripts running in a loop checking if a new partition is created.
>   Pros: easy to write, easy to control
>   Cons: will introduce another delay based on the probing interval.
> B. Triggered: The command is registered inside the hive metastore. Whenever a partition gets created, we run the registered command. 
> Several questions around option B are:
> 1. whether to support registration of HiveQL or shell command;
> 2. which machine/environment to run the command;
> 3. what to do if the registered command failed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-788) Triggers when a new partition is created for a table

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12747660#action_12747660 ] 

Zheng Shao commented on HIVE-788:
---------------------------------

Edward, yes, by A I mean the user will manage and run a script himself, and the script will call a Hive command which waits for a new partition to appear.

Talked offline with Ashish about the 3 questions of B. 
1. We will support shell command for now. The reason is that users can hook the trigger up with some other existing job/process management tool to monitor the status of the triggered command.
2. The MoveTask (which calls db.loadTable and db.loadPartition) will be running the shell command on the same machine that loads the load/partition. (Since there may not be a HiveServer available)
3.  If the shell command failed, the move task will return failure (while the new table/partition is already created / data updated in case of overwrite). This is also a simple choice because we don't have the concept of transactions/roll back yet.

So it seems B will be a better way to go.


The next question would be, what are the types of trigger we want to support now:
1. On new partition creation in a specified table
2. On data change (overwrite/append) in a specified table (or any partitions of a specified table)

There might be more but it seems these two are highly wanted.


> Triggers when a new partition is created for a table
> ----------------------------------------------------
>
>                 Key: HIVE-788
>                 URL: https://issues.apache.org/jira/browse/HIVE-788
>             Project: Hadoop Hive
>          Issue Type: New Feature
>            Reporter: Zheng Shao
>
> One requirement for HIVE-787 is that users would like to run a command whenever a new partition of a Hive table gets created.
> There are several ways to achieve this functionality:
> A. Probe and wait: We can have the scripts running in a loop checking if a new partition is created.
>   Pros: easy to write, easy to control
>   Cons: will introduce another delay based on the probing interval.
> B. Triggered: The command is registered inside the hive metastore. Whenever a partition gets created, we run the registered command. 
> Several questions around option B are:
> 1. whether to support registration of HiveQL or shell command;
> 2. which machine/environment to run the command;
> 3. what to do if the registered command failed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-788) Triggers when a new partition is created for a table

Posted by "Edward Capriolo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12747158#action_12747158 ] 

Edward Capriolo commented on HIVE-788:
--------------------------------------

It would also be nice to triffer when a new file is added to the table. I do alot of "load data infile" to already existing partitions.

> Triggers when a new partition is created for a table
> ----------------------------------------------------
>
>                 Key: HIVE-788
>                 URL: https://issues.apache.org/jira/browse/HIVE-788
>             Project: Hadoop Hive
>          Issue Type: New Feature
>            Reporter: Zheng Shao
>
> One requirement for HIVE-787 is that users would like to run a command whenever a new partition of a Hive table gets created.
> There are several ways to achieve this functionality:
> A. Probe and wait: We can have the scripts running in a loop checking if a new partition is created.
>   Pros: easy to write, easy to control
>   Cons: will introduce another delay based on the probing interval.
> B. Triggered: The command is registered inside the hive metastore. Whenever a partition gets created, we run the registered command. 
> Several questions around option B are:
> 1. whether to support registration of HiveQL or shell command;
> 2. which machine/environment to run the command;
> 3. what to do if the registered command failed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-788) Triggers when a new partition is created for a table

Posted by "Edward Capriolo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12747185#action_12747185 ] 

Edward Capriolo commented on HIVE-788:
--------------------------------------

I reviewed http://dev.mysql.com/doc/refman/5.1/en/trigger-syntax.html pretty cool.  Would a trigger be implemented as a java class implementing an interface or would the functionality be provided by a meta language? I understand how B would work. The probe and wait is tricky, because what component will be doing that? Concurrent probes? It seems like a user can write a probe wait themselves outside of hive.

> Triggers when a new partition is created for a table
> ----------------------------------------------------
>
>                 Key: HIVE-788
>                 URL: https://issues.apache.org/jira/browse/HIVE-788
>             Project: Hadoop Hive
>          Issue Type: New Feature
>            Reporter: Zheng Shao
>
> One requirement for HIVE-787 is that users would like to run a command whenever a new partition of a Hive table gets created.
> There are several ways to achieve this functionality:
> A. Probe and wait: We can have the scripts running in a loop checking if a new partition is created.
>   Pros: easy to write, easy to control
>   Cons: will introduce another delay based on the probing interval.
> B. Triggered: The command is registered inside the hive metastore. Whenever a partition gets created, we run the registered command. 
> Several questions around option B are:
> 1. whether to support registration of HiveQL or shell command;
> 2. which machine/environment to run the command;
> 3. what to do if the registered command failed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-788) Triggers when a new partition is created for a table

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12747177#action_12747177 ] 

Ashish Thusoo commented on HIVE-788:
------------------------------------

The idea here is similar to DML triggers. If we do this as triggers on the insert and load statements then we can address Edward's use case as well. Basically whenever data is added to a table a trigger gets executed. Look at DML triggers at

Look at

http://dev.mysql.com/doc/refman/5.1/en/trigger-syntax.html

The DML triggers may be before or after inserts. You can find similar examples with Oracle.

One variation is obviously to execute a user script when the trigger fires.

A is better if the number of such scripts is low in number but if more and more users use these mechanisms it become quickly non performant and would put a lot of load on the metastore. Traditionally B has been the model is most RDBMSs so when we call these triggers users would expect similar functionality perhaps.


> Triggers when a new partition is created for a table
> ----------------------------------------------------
>
>                 Key: HIVE-788
>                 URL: https://issues.apache.org/jira/browse/HIVE-788
>             Project: Hadoop Hive
>          Issue Type: New Feature
>            Reporter: Zheng Shao
>
> One requirement for HIVE-787 is that users would like to run a command whenever a new partition of a Hive table gets created.
> There are several ways to achieve this functionality:
> A. Probe and wait: We can have the scripts running in a loop checking if a new partition is created.
>   Pros: easy to write, easy to control
>   Cons: will introduce another delay based on the probing interval.
> B. Triggered: The command is registered inside the hive metastore. Whenever a partition gets created, we run the registered command. 
> Several questions around option B are:
> 1. whether to support registration of HiveQL or shell command;
> 2. which machine/environment to run the command;
> 3. what to do if the registered command failed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-788) Triggers when a new partition is created for a table

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12747151#action_12747151 ] 

Zheng Shao commented on HIVE-788:
---------------------------------

For A, we can add a command to Hive to return a partition with partition keys greater than a given partition. Something like "wait for next partition of table xxx [greater than ds=yyy, ts=zzz]". This command will return the first partition after the given partition. If the wait command is implemented natively using hive, the cost of each probe will be just a SQL query on the metastore (so the probe interval can be something like one minute or less).

Once we get the new partition, the client can decide what to do. This can avoid hard questions like B.1, B.2, B.3.


> Triggers when a new partition is created for a table
> ----------------------------------------------------
>
>                 Key: HIVE-788
>                 URL: https://issues.apache.org/jira/browse/HIVE-788
>             Project: Hadoop Hive
>          Issue Type: New Feature
>            Reporter: Zheng Shao
>
> One requirement for HIVE-787 is that users would like to run a command whenever a new partition of a Hive table gets created.
> There are several ways to achieve this functionality:
> A. Probe and wait: We can have the scripts running in a loop checking if a new partition is created.
>   Pros: easy to write, easy to control
>   Cons: will introduce another delay based on the probing interval.
> B. Triggered: The command is registered inside the hive metastore. Whenever a partition gets created, we run the registered command. 
> Several questions around option B are:
> 1. whether to support registration of HiveQL or shell command;
> 2. which machine/environment to run the command;
> 3. what to do if the registered command failed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.