You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by William Kornfeld <wk...@baynote.com> on 2011/11/30 01:27:29 UTC

Scheduling Hive Jobs (Oozie vs. Pentaho vs. something else)

We are building an application that involves chains of M/R jobs, most likely all will be written in Hive.  We need to start a Hive job when one or more prerequisite data sets appear (defined in the Hive sense as a new partition having been populated with data) - OR- a particular time has been reached.

We know of two scheduling packages that appear to solve this problem: Oozie & Pentaho (to which my company has a license).

Does anyone have actual experience using either of these (or something else) to schedule Hive jobs?

William Kornfeld
Baynote


Re: Scheduling Hive Jobs (Oozie vs. Pentaho vs. something else)

Posted by Aaron Sun <aa...@gmail.com>.
Azkaban is worth to look at

On Tue, Nov 29, 2011 at 4:27 PM, William Kornfeld <wk...@baynote.com>wrote:

>  We are building an application that involves chains of M/R jobs, most
> likely all will be written in Hive.  We need to start a Hive job when one
> or more prerequisite data sets appear (defined in the Hive sense as a new
> partition having been populated with data) - OR- a particular time has been
> reached.
>
> We know of two scheduling packages that appear to solve this problem:
> Oozie & Pentaho (to which my company has a license).
>
> Does anyone have actual experience using either of these (or something
> else) to schedule Hive jobs?
>
> William Kornfeld
> Baynote
>
>

Re: Scheduling Hive Jobs (Oozie vs. Pentaho vs. something else)

Posted by Jasper Knulst <ja...@incentro.com>.
Hi William,

I have hands-on experience with Pentaho for Hadoop, that is the PDI
(Pentaho Data Integration) module. There are components out there (called
"steps") that can check whether a file is there (in HDFS or somewhere
else). If the file is not there yet, you could check again every X minutes.
The time based trigger is also possible.

Cheers

Jasper

2011/11/30 Alejandro Abdelnur <tu...@cloudera.com>

> William,
>
> Oozie workflow jobs support Hive actions and Oozie coordinator jobs
> support time/data activation of workflow jobs.
>
> Cheers.
>
> Alejandro
>
> On Tue, Nov 29, 2011 at 4:27 PM, William Kornfeld <wk...@baynote.com>wrote:
>
>>  We are building an application that involves chains of M/R jobs, most
>> likely all will be written in Hive.  We need to start a Hive job when one
>> or more prerequisite data sets appear (defined in the Hive sense as a new
>> partition having been populated with data) - OR- a particular time has been
>> reached.
>>
>> We know of two scheduling packages that appear to solve this problem:
>> Oozie & Pentaho (to which my company has a license).
>>
>> Does anyone have actual experience using either of these (or something
>> else) to schedule Hive jobs?
>>
>> William Kornfeld
>> Baynote
>>
>>
>


-- 

*Jasper Knulst*
Consultant *|* Incentro Den Haag

Gildeweg 5B
2632 BD Nootdorp
The Netherlands

*E:* jasper.knulst@incentro.com
*T:* +31157640750
*M: *+31619667511
*W:* www.incentro.com

[image: Logo Incentro]

Re: Scheduling Hive Jobs (Oozie vs. Pentaho vs. something else)

Posted by Alejandro Abdelnur <tu...@cloudera.com>.
William,

Oozie workflow jobs support Hive actions and Oozie coordinator jobs support
time/data activation of workflow jobs.

Cheers.

Alejandro

On Tue, Nov 29, 2011 at 4:27 PM, William Kornfeld <wk...@baynote.com>wrote:

>  We are building an application that involves chains of M/R jobs, most
> likely all will be written in Hive.  We need to start a Hive job when one
> or more prerequisite data sets appear (defined in the Hive sense as a new
> partition having been populated with data) - OR- a particular time has been
> reached.
>
> We know of two scheduling packages that appear to solve this problem:
> Oozie & Pentaho (to which my company has a license).
>
> Does anyone have actual experience using either of these (or something
> else) to schedule Hive jobs?
>
> William Kornfeld
> Baynote
>
>