You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@falcon.apache.org by Alex Nastetsky <an...@spryinc.com> on 2014/09/09 17:31:26 UTC

falcon vs oozie

Hi,

I have a general usage question about Falcon. I don't see a "user" mailing
list, so I am sending it here. If there's a better place to direct the
question, please let me know.

I have been looking at the OnBoarding:
http://falcon.incubator.apache.org/docs/OnBoarding.html

I understand that Falcon uses Oozie underneath. What is the advantage of
using Falcon instead of using Oozie directly?

It looks like you can specify in your Input Feed information about your
input data, but you can parameterize your paths in Oozie as well (using
job.properties).

I have also heard conflicting information about whether Falcon generates
Oozie workflow.xml files, but in that on-boarding example, it looks like
you need to create the workflow.xml manually. Which is correct?

Thanks in advance,
Alex.

Re: falcon vs oozie

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
Hi Alex,

As we discussed on IRC, Oozie is mostly focus on the job execution: it 
means that it can trigger/schedule the jobs. It will start a job when 
another one is done, or another job in case of failure.

Falcon is more data motion oriented. It means it can trigger a job when 
the data changed (the data coming from another job for instance).

You are right, in order to create a Falcon process, you have to create 
the workflow.xml by hand. But a process can also be a pig process and 
here you don't need the workflow.xml. I proposed to add new kind of 
Falcon processes to avoid to create the workflow.xml by hand.

Regards
JB

On 09/09/2014 05:31 PM, Alex Nastetsky wrote:
> Hi,
>
> I have a general usage question about Falcon. I don't see a "user" mailing
> list, so I am sending it here. If there's a better place to direct the
> question, please let me know.
>
> I have been looking at the OnBoarding:
> http://falcon.incubator.apache.org/docs/OnBoarding.html
>
> I understand that Falcon uses Oozie underneath. What is the advantage of
> using Falcon instead of using Oozie directly?
>
> It looks like you can specify in your Input Feed information about your
> input data, but you can parameterize your paths in Oozie as well (using
> job.properties).
>
> I have also heard conflicting information about whether Falcon generates
> Oozie workflow.xml files, but in that on-boarding example, it looks like
> you need to create the workflow.xml manually. Which is correct?
>
> Thanks in advance,
> Alex.
>

-- 
Jean-Baptiste Onofré
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Re: falcon vs oozie

Posted by Shwetha GS <sh...@inmobi.com>.
Alex,

Oozie lets you schedule data processing jobs. The emphasis is mainly on
processing and Oozie lets you define this processing through workflow and
coordinator (recurring workflow). You can specify the input datasets for
data processing (in coordinator) where you specify the data properties like
path, frequency, etc. If there are 2 coordinators that depend on the same
data, these details have to be defined twice. Now, if you want to add data
eviction(delete old data) , you have to define another coordinator. Oozie
provides APIs to manage these coordinators, but there is no easy way to
define and track the data lifecyle.


In contrast, falcon gives data view. Data is defined as Feed entity(with a
unique name) which contains the data path, frequency, the clusters where
this data exists, how long the data is retained in each cluster(eviction),
how the data is replicated across clusters and so on. The standard data
recipes like acquisition, eviction, replication are available directly. To
enable data processing across datasets, falcon exposes Process entity which
contains the input and output feed names(which references feed names
already defined), frequency of processing and how the data should be
processed. Data processing can be defined using either pig script, hive
script or oozie workflow.

In the backend, the different data lifecycles are implemented using a
scheduler which is Oozie currently, but can be replaced easily. Falcon APIs
hide the scheduler details and give easy way to define and manage the data
lifecycles.

Regards,
Shwetha





On Tue, Sep 9, 2014 at 9:01 PM, Alex Nastetsky <an...@spryinc.com>
wrote:

> Hi,
>
> I have a general usage question about Falcon. I don't see a "user" mailing
> list, so I am sending it here. If there's a better place to direct the
> question, please let me know.
>
> I have been looking at the OnBoarding:
> http://falcon.incubator.apache.org/docs/OnBoarding.html
>
> I understand that Falcon uses Oozie underneath. What is the advantage of
> using Falcon instead of using Oozie directly?
>
> It looks like you can specify in your Input Feed information about your
> input data, but you can parameterize your paths in Oozie as well (using
> job.properties).
>
> I have also heard conflicting information about whether Falcon generates
> Oozie workflow.xml files, but in that on-boarding example, it looks like
> you need to create the workflow.xml manually. Which is correct?
>
> Thanks in advance,
> Alex.
>

-- 
_____________________________________________________________
The information contained in this communication is intended solely for the 
use of the individual or entity to whom it is addressed and others 
authorized to receive it. It may contain confidential or legally privileged 
information. If you are not the intended recipient you are hereby notified 
that any disclosure, copying, distribution or taking any action in reliance 
on the contents of this information is strictly prohibited and may be 
unlawful. If you have received this communication in error, please notify 
us immediately by responding to this email and then delete it from your 
system. The firm is neither liable for the proper and complete transmission 
of the information contained in this communication nor for any delay in its 
receipt.

Re: falcon vs oozie

Posted by Alex Nastetsky <an...@spryinc.com>.
Srikanth, looks like you responded with an empty message.


On Tue, Sep 9, 2014 at 11:49 AM, Srikanth Sundarrajan <sr...@hotmail.com>
wrote:

>
>
> Sent from my iPhone
>
> On 09-Sep-2014, at 9:01 pm, Alex Nastetsky <an...@spryinc.com> wrote:
>
> Hi,
>
> I have a general usage question about Falcon. I don't see a "user" mailing
> list, so I am sending it here. If there's a better place to direct the
> question, please let me know.
>
> I have been looking at the OnBoarding:
> http://falcon.incubator.apache.org/docs/OnBoarding.html
>
> I understand that Falcon uses Oozie underneath. What is the advantage of
> using Falcon instead of using Oozie directly?
>
> It looks like you can specify in your Input Feed information about your
> input data, but you can parameterize your paths in Oozie as well (using
> job.properties).
>
> I have also heard conflicting information about whether Falcon generates
> Oozie workflow.xml files, but in that on-boarding example, it looks like
> you need to create the workflow.xml manually. Which is correct?
>
> Thanks in advance,
> Alex.
>
>

Re: falcon vs oozie

Posted by Srikanth Sundarrajan <sr...@hotmail.com>.

Sent from my iPhone

> On 09-Sep-2014, at 9:01 pm, Alex Nastetsky <an...@spryinc.com> wrote:
> 
> Hi,
> 
> I have a general usage question about Falcon. I don't see a "user" mailing list, so I am sending it here. If there's a better place to direct the question, please let me know.
> 
> I have been looking at the OnBoarding: http://falcon.incubator.apache.org/docs/OnBoarding.html
> 
> I understand that Falcon uses Oozie underneath. What is the advantage of using Falcon instead of using Oozie directly? 
> 
> It looks like you can specify in your Input Feed information about your input data, but you can parameterize your paths in Oozie as well (using job.properties).
> 
> I have also heard conflicting information about whether Falcon generates Oozie workflow.xml files, but in that on-boarding example, it looks like you need to create the workflow.xml manually. Which is correct?
> 
> Thanks in advance,
> Alex.