You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by William Kornfeld <wk...@baynote.com> on 2011/11/30 01:27:29 UTC
Scheduling Hive Jobs (Oozie vs. Pentaho vs. something else)
We are building an application that involves chains of M/R jobs, most likely all will be written in Hive. We need to start a Hive job when one or more prerequisite data sets appear (defined in the Hive sense as a new partition having been populated with data) - OR- a particular time has been reached.
We know of two scheduling packages that appear to solve this problem: Oozie & Pentaho (to which my company has a license).
Does anyone have actual experience using either of these (or something else) to schedule Hive jobs?
William Kornfeld
Baynote
Re: Scheduling Hive Jobs (Oozie vs. Pentaho vs. something else)
Posted by Aaron Sun <aa...@gmail.com>.
Azkaban is worth to look at
On Tue, Nov 29, 2011 at 4:27 PM, William Kornfeld <wk...@baynote.com>wrote:
> We are building an application that involves chains of M/R jobs, most
> likely all will be written in Hive. We need to start a Hive job when one
> or more prerequisite data sets appear (defined in the Hive sense as a new
> partition having been populated with data) - OR- a particular time has been
> reached.
>
> We know of two scheduling packages that appear to solve this problem:
> Oozie & Pentaho (to which my company has a license).
>
> Does anyone have actual experience using either of these (or something
> else) to schedule Hive jobs?
>
> William Kornfeld
> Baynote
>
>
Re: Scheduling Hive Jobs (Oozie vs. Pentaho vs. something else)
Posted by Jasper Knulst <ja...@incentro.com>.
Hi William,
I have hands-on experience with Pentaho for Hadoop, that is the PDI
(Pentaho Data Integration) module. There are components out there (called
"steps") that can check whether a file is there (in HDFS or somewhere
else). If the file is not there yet, you could check again every X minutes.
The time based trigger is also possible.
Cheers
Jasper
2011/11/30 Alejandro Abdelnur <tu...@cloudera.com>
> William,
>
> Oozie workflow jobs support Hive actions and Oozie coordinator jobs
> support time/data activation of workflow jobs.
>
> Cheers.
>
> Alejandro
>
> On Tue, Nov 29, 2011 at 4:27 PM, William Kornfeld <wk...@baynote.com>wrote:
>
>> We are building an application that involves chains of M/R jobs, most
>> likely all will be written in Hive. We need to start a Hive job when one
>> or more prerequisite data sets appear (defined in the Hive sense as a new
>> partition having been populated with data) - OR- a particular time has been
>> reached.
>>
>> We know of two scheduling packages that appear to solve this problem:
>> Oozie & Pentaho (to which my company has a license).
>>
>> Does anyone have actual experience using either of these (or something
>> else) to schedule Hive jobs?
>>
>> William Kornfeld
>> Baynote
>>
>>
>
--
*Jasper Knulst*
Consultant *|* Incentro Den Haag
Gildeweg 5B
2632 BD Nootdorp
The Netherlands
*E:* jasper.knulst@incentro.com
*T:* +31157640750
*M: *+31619667511
*W:* www.incentro.com
[image: Logo Incentro]
Re: Scheduling Hive Jobs (Oozie vs. Pentaho vs. something else)
Posted by Alejandro Abdelnur <tu...@cloudera.com>.
William,
Oozie workflow jobs support Hive actions and Oozie coordinator jobs support
time/data activation of workflow jobs.
Cheers.
Alejandro
On Tue, Nov 29, 2011 at 4:27 PM, William Kornfeld <wk...@baynote.com>wrote:
> We are building an application that involves chains of M/R jobs, most
> likely all will be written in Hive. We need to start a Hive job when one
> or more prerequisite data sets appear (defined in the Hive sense as a new
> partition having been populated with data) - OR- a particular time has been
> reached.
>
> We know of two scheduling packages that appear to solve this problem:
> Oozie & Pentaho (to which my company has a license).
>
> Does anyone have actual experience using either of these (or something
> else) to schedule Hive jobs?
>
> William Kornfeld
> Baynote
>
>