You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@falcon.apache.org by Alex Nastetsky <an...@spryinc.com> on 2014/09/09 17:31:26 UTC
falcon vs oozie
Hi,
I have a general usage question about Falcon. I don't see a "user" mailing
list, so I am sending it here. If there's a better place to direct the
question, please let me know.
I have been looking at the OnBoarding:
http://falcon.incubator.apache.org/docs/OnBoarding.html
I understand that Falcon uses Oozie underneath. What is the advantage of
using Falcon instead of using Oozie directly?
It looks like you can specify in your Input Feed information about your
input data, but you can parameterize your paths in Oozie as well (using
job.properties).
I have also heard conflicting information about whether Falcon generates
Oozie workflow.xml files, but in that on-boarding example, it looks like
you need to create the workflow.xml manually. Which is correct?
Thanks in advance,
Alex.
Re: falcon vs oozie
Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
Hi Alex,
As we discussed on IRC, Oozie is mostly focus on the job execution: it
means that it can trigger/schedule the jobs. It will start a job when
another one is done, or another job in case of failure.
Falcon is more data motion oriented. It means it can trigger a job when
the data changed (the data coming from another job for instance).
You are right, in order to create a Falcon process, you have to create
the workflow.xml by hand. But a process can also be a pig process and
here you don't need the workflow.xml. I proposed to add new kind of
Falcon processes to avoid to create the workflow.xml by hand.
Regards
JB
On 09/09/2014 05:31 PM, Alex Nastetsky wrote:
> Hi,
>
> I have a general usage question about Falcon. I don't see a "user" mailing
> list, so I am sending it here. If there's a better place to direct the
> question, please let me know.
>
> I have been looking at the OnBoarding:
> http://falcon.incubator.apache.org/docs/OnBoarding.html
>
> I understand that Falcon uses Oozie underneath. What is the advantage of
> using Falcon instead of using Oozie directly?
>
> It looks like you can specify in your Input Feed information about your
> input data, but you can parameterize your paths in Oozie as well (using
> job.properties).
>
> I have also heard conflicting information about whether Falcon generates
> Oozie workflow.xml files, but in that on-boarding example, it looks like
> you need to create the workflow.xml manually. Which is correct?
>
> Thanks in advance,
> Alex.
>
--
Jean-Baptiste Onofré
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com
Re: falcon vs oozie
Posted by Shwetha GS <sh...@inmobi.com>.
Alex,
Oozie lets you schedule data processing jobs. The emphasis is mainly on
processing and Oozie lets you define this processing through workflow and
coordinator (recurring workflow). You can specify the input datasets for
data processing (in coordinator) where you specify the data properties like
path, frequency, etc. If there are 2 coordinators that depend on the same
data, these details have to be defined twice. Now, if you want to add data
eviction(delete old data) , you have to define another coordinator. Oozie
provides APIs to manage these coordinators, but there is no easy way to
define and track the data lifecyle.
In contrast, falcon gives data view. Data is defined as Feed entity(with a
unique name) which contains the data path, frequency, the clusters where
this data exists, how long the data is retained in each cluster(eviction),
how the data is replicated across clusters and so on. The standard data
recipes like acquisition, eviction, replication are available directly. To
enable data processing across datasets, falcon exposes Process entity which
contains the input and output feed names(which references feed names
already defined), frequency of processing and how the data should be
processed. Data processing can be defined using either pig script, hive
script or oozie workflow.
In the backend, the different data lifecycles are implemented using a
scheduler which is Oozie currently, but can be replaced easily. Falcon APIs
hide the scheduler details and give easy way to define and manage the data
lifecycles.
Regards,
Shwetha
On Tue, Sep 9, 2014 at 9:01 PM, Alex Nastetsky <an...@spryinc.com>
wrote:
> Hi,
>
> I have a general usage question about Falcon. I don't see a "user" mailing
> list, so I am sending it here. If there's a better place to direct the
> question, please let me know.
>
> I have been looking at the OnBoarding:
> http://falcon.incubator.apache.org/docs/OnBoarding.html
>
> I understand that Falcon uses Oozie underneath. What is the advantage of
> using Falcon instead of using Oozie directly?
>
> It looks like you can specify in your Input Feed information about your
> input data, but you can parameterize your paths in Oozie as well (using
> job.properties).
>
> I have also heard conflicting information about whether Falcon generates
> Oozie workflow.xml files, but in that on-boarding example, it looks like
> you need to create the workflow.xml manually. Which is correct?
>
> Thanks in advance,
> Alex.
>
--
_____________________________________________________________
The information contained in this communication is intended solely for the
use of the individual or entity to whom it is addressed and others
authorized to receive it. It may contain confidential or legally privileged
information. If you are not the intended recipient you are hereby notified
that any disclosure, copying, distribution or taking any action in reliance
on the contents of this information is strictly prohibited and may be
unlawful. If you have received this communication in error, please notify
us immediately by responding to this email and then delete it from your
system. The firm is neither liable for the proper and complete transmission
of the information contained in this communication nor for any delay in its
receipt.
Re: falcon vs oozie
Posted by Alex Nastetsky <an...@spryinc.com>.
Srikanth, looks like you responded with an empty message.
On Tue, Sep 9, 2014 at 11:49 AM, Srikanth Sundarrajan <sr...@hotmail.com>
wrote:
>
>
> Sent from my iPhone
>
> On 09-Sep-2014, at 9:01 pm, Alex Nastetsky <an...@spryinc.com> wrote:
>
> Hi,
>
> I have a general usage question about Falcon. I don't see a "user" mailing
> list, so I am sending it here. If there's a better place to direct the
> question, please let me know.
>
> I have been looking at the OnBoarding:
> http://falcon.incubator.apache.org/docs/OnBoarding.html
>
> I understand that Falcon uses Oozie underneath. What is the advantage of
> using Falcon instead of using Oozie directly?
>
> It looks like you can specify in your Input Feed information about your
> input data, but you can parameterize your paths in Oozie as well (using
> job.properties).
>
> I have also heard conflicting information about whether Falcon generates
> Oozie workflow.xml files, but in that on-boarding example, it looks like
> you need to create the workflow.xml manually. Which is correct?
>
> Thanks in advance,
> Alex.
>
>
Re: falcon vs oozie
Posted by Srikanth Sundarrajan <sr...@hotmail.com>.
Sent from my iPhone
> On 09-Sep-2014, at 9:01 pm, Alex Nastetsky <an...@spryinc.com> wrote:
>
> Hi,
>
> I have a general usage question about Falcon. I don't see a "user" mailing list, so I am sending it here. If there's a better place to direct the question, please let me know.
>
> I have been looking at the OnBoarding: http://falcon.incubator.apache.org/docs/OnBoarding.html
>
> I understand that Falcon uses Oozie underneath. What is the advantage of using Falcon instead of using Oozie directly?
>
> It looks like you can specify in your Input Feed information about your input data, but you can parameterize your paths in Oozie as well (using job.properties).
>
> I have also heard conflicting information about whether Falcon generates Oozie workflow.xml files, but in that on-boarding example, it looks like you need to create the workflow.xml manually. Which is correct?
>
> Thanks in advance,
> Alex.