You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@falcon.apache.org by Srikanth Sundarrajan <sr...@hotmail.com> on 2014/01/09 11:16:18 UTC

[DISCUSS] Falcon Roadmap



Hi Everyone,    We have made good progress on the Falcon project since its incubation. We have had an initial release (0.3-incubating), following which we have added support for Hadoop 2.0, Hcatalog integration, hive execution engine. Venkatesh Seetharam has recently called for vote for release of 0.4-incubating with these features. We are now actively adding security features besides a number of operability improvements, all of which should go out in 0.5-incubating release in near future. At this juncture, I wanted to get the thoughts from the community on the following feature addition to Falcon over the next few releases.
1. Falcon Administration and Monitoring Dashboard (Umbrella JIRA: FALCON-37)2. Support for Falcon Process Designer (Umberlla JIRA: FALCON-253)3. Support stream abstractions and allow for streaming processing through Falcon over Apache Storm
RegardsSrikanth Sundarrajan
 		 	   		  

Re: [DISCUSS] Falcon Roadmap

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
What do you think about "implicit" workflow for DataCapture in the roadmap ?

Let me take an example:

1/ I submit a cluster:
bin/falcon entity -submit -type cluster -file local.xml
where local.xml contains:

<cluster colo="local" description="Local cluster" name="local" 
xmlns="uri:falcon:cluster:0.1">
     <interfaces>
         <interface type="readonly" endpoint="hftp://localhost:50010" 
version="1.1.2"/>
         <interface type="write" endpoint="hdfs://localhost:8020" 
version="1.1.2"/>
         <interface type="execute" endpoint="localhost:8021" 
version="1.1.2"/>
         <interface type="workflow" 
endpoint="http://localhost:11000/oozie/" version="4.0.0"/>
         <interface type="messaging" endpoint="tcp://localhost:61616" 
version="5.7.0"/>
     </interfaces>
     <locations>
         <location name="staging" path="/falcon/staging"/>
         <location name="temp" path="/falcon/tmp"/>
         <location name="working" path="/falcon/working"/>
     </locations>
     <properties>
     </properties>
</cluster>

2/ I submit a feed:
bin/falcon entity -submit -type feed -file feed.xml
where feed.xml contains:

<feed description="" name="output" xmlns="uri:falcon:feed:0.1">
     <groups>output</groups>

     <frequency>minutes(1)</frequency>
     <timezone>UTC</timezone>

     <clusters>
         <cluster name="local">
             <validity start="2013-01-01T00:00Z" end="2030-01-01T00:00Z"/>
             <retention limit="minutes(2)" action="delete"/>
	    <capture interval="minutes(5)"/>
         </cluster>
     </clusters>

     <locations>
         <location type="data" path="/data/output"/>
     </locations>

     <ACL owner="jbonofre" group="group" permission="0x644"/>
     <schema location="/falcon/schema/none" provider="none"/>
</feed>

Note the <capture/> element. With the capture element, the feed is not 
really generated at the frequency interval, but checked at the capture 
interval. By checked, it compares the previous state (using the 
retention) and send a message to the JMS broker. The message contains 
the "delta" (so basically the change), between the previous check and 
the latest one (if there is).
It means that we have to create a coord job in oozie that execute at the 
capture interval and executing a "special" job to manage delta/diff.

WDYT ?

Regards
JB

On 01/09/2014 01:10 PM, Sharad Agarwal wrote:
> Looks Great.
> I would also like to propose to add 4) Support stream processing over Tez
> Once we have the streaming abstractions, it should be easy to build for Tez
> too. This would allow to run faster pipelines in Hadoop itself.
>
>
> On Thu, Jan 9, 2014 at 4:01 PM, Srikanth Sundarrajan <sr...@hotmail.com>wrote:
>
>> Hi Jean,If there is enough interest in this area and is in line with the
>> larger objectives of the project, we should certainly be able to add it to
>> the Roadmap. I am keen to learn more about this and your thinking on the
>> topic.
>> RegardsSrikanth Sundarrajan
>>
>>> Date: Thu, 9 Jan 2014 11:20:54 +0100
>>> From: jb@nanthrax.net
>>> To: dev@falcon.incubator.apache.org
>>> Subject: Re: [DISCUSS] Falcon Roadmap
>>>
>>> Hi Srikanth,
>>>
>>> The roadmap looks good to me.
>>>
>>> Do we have any plan about data anonymity ?
>>>
>>> On my side, I'm working on an example of CDC with Falcon and Camel, and
>>> support of Falcon commands directly in Karaf. (update to ActiveMQ 5.9.0
>>> is in progress too).
>>>
>>> Regards
>>> JB
>>>
>>> On 01/09/2014 11:16 AM, Srikanth Sundarrajan wrote:
>>>>
>>>>
>>>>
>>>> Hi Everyone,    We have made good progress on the Falcon project since
>> its incubation. We have had an initial release (0.3-incubating), following
>> which we have added support for Hadoop 2.0, Hcatalog integration, hive
>> execution engine. Venkatesh Seetharam has recently called for vote for
>> release of 0.4-incubating with these features. We are now actively adding
>> security features besides a number of operability improvements, all of
>> which should go out in 0.5-incubating release in near future. At this
>> juncture, I wanted to get the thoughts from the community on the following
>> feature addition to Falcon over the next few releases.
>>>> 1. Falcon Administration and Monitoring Dashboard (Umbrella JIRA:
>> FALCON-37)2. Support for Falcon Process Designer (Umberlla JIRA:
>> FALCON-253)3. Support stream abstractions and allow for streaming
>> processing through Falcon over Apache Storm
>>>> RegardsSrikanth Sundarrajan
>>>>
>>>>
>>>
>>> --
>>> Jean-Baptiste Onofré
>>> jbonofre@apache.org
>>> http://blog.nanthrax.net
>>> Talend - http://www.talend.com
>>
>>
>

-- 
Jean-Baptiste Onofré
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Re: [DISCUSS] Falcon Roadmap

Posted by Sharad Agarwal <sh...@apache.org>.
Looks Great.
I would also like to propose to add 4) Support stream processing over Tez
Once we have the streaming abstractions, it should be easy to build for Tez
too. This would allow to run faster pipelines in Hadoop itself.


On Thu, Jan 9, 2014 at 4:01 PM, Srikanth Sundarrajan <sr...@hotmail.com>wrote:

> Hi Jean,If there is enough interest in this area and is in line with the
> larger objectives of the project, we should certainly be able to add it to
> the Roadmap. I am keen to learn more about this and your thinking on the
> topic.
> RegardsSrikanth Sundarrajan
>
> > Date: Thu, 9 Jan 2014 11:20:54 +0100
> > From: jb@nanthrax.net
> > To: dev@falcon.incubator.apache.org
> > Subject: Re: [DISCUSS] Falcon Roadmap
> >
> > Hi Srikanth,
> >
> > The roadmap looks good to me.
> >
> > Do we have any plan about data anonymity ?
> >
> > On my side, I'm working on an example of CDC with Falcon and Camel, and
> > support of Falcon commands directly in Karaf. (update to ActiveMQ 5.9.0
> > is in progress too).
> >
> > Regards
> > JB
> >
> > On 01/09/2014 11:16 AM, Srikanth Sundarrajan wrote:
> > >
> > >
> > >
> > > Hi Everyone,    We have made good progress on the Falcon project since
> its incubation. We have had an initial release (0.3-incubating), following
> which we have added support for Hadoop 2.0, Hcatalog integration, hive
> execution engine. Venkatesh Seetharam has recently called for vote for
> release of 0.4-incubating with these features. We are now actively adding
> security features besides a number of operability improvements, all of
> which should go out in 0.5-incubating release in near future. At this
> juncture, I wanted to get the thoughts from the community on the following
> feature addition to Falcon over the next few releases.
> > > 1. Falcon Administration and Monitoring Dashboard (Umbrella JIRA:
> FALCON-37)2. Support for Falcon Process Designer (Umberlla JIRA:
> FALCON-253)3. Support stream abstractions and allow for streaming
> processing through Falcon over Apache Storm
> > > RegardsSrikanth Sundarrajan
> > >
> > >
> >
> > --
> > Jean-Baptiste Onofré
> > jbonofre@apache.org
> > http://blog.nanthrax.net
> > Talend - http://www.talend.com
>
>

RE: [DISCUSS] Falcon Roadmap

Posted by Srikanth Sundarrajan <sr...@hotmail.com>.
Hi Jean,If there is enough interest in this area and is in line with the larger objectives of the project, we should certainly be able to add it to the Roadmap. I am keen to learn more about this and your thinking on the topic.
RegardsSrikanth Sundarrajan

> Date: Thu, 9 Jan 2014 11:20:54 +0100
> From: jb@nanthrax.net
> To: dev@falcon.incubator.apache.org
> Subject: Re: [DISCUSS] Falcon Roadmap
> 
> Hi Srikanth,
> 
> The roadmap looks good to me.
> 
> Do we have any plan about data anonymity ?
> 
> On my side, I'm working on an example of CDC with Falcon and Camel, and 
> support of Falcon commands directly in Karaf. (update to ActiveMQ 5.9.0 
> is in progress too).
> 
> Regards
> JB
> 
> On 01/09/2014 11:16 AM, Srikanth Sundarrajan wrote:
> >
> >
> >
> > Hi Everyone,    We have made good progress on the Falcon project since its incubation. We have had an initial release (0.3-incubating), following which we have added support for Hadoop 2.0, Hcatalog integration, hive execution engine. Venkatesh Seetharam has recently called for vote for release of 0.4-incubating with these features. We are now actively adding security features besides a number of operability improvements, all of which should go out in 0.5-incubating release in near future. At this juncture, I wanted to get the thoughts from the community on the following feature addition to Falcon over the next few releases.
> > 1. Falcon Administration and Monitoring Dashboard (Umbrella JIRA: FALCON-37)2. Support for Falcon Process Designer (Umberlla JIRA: FALCON-253)3. Support stream abstractions and allow for streaming processing through Falcon over Apache Storm
> > RegardsSrikanth Sundarrajan
> >   		 	   		
> >
> 
> -- 
> Jean-Baptiste Onofré
> jbonofre@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
 		 	   		  

Re: [DISCUSS] Falcon Roadmap

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
Hi Srikanth,

The roadmap looks good to me.

Do we have any plan about data anonymity ?

On my side, I'm working on an example of CDC with Falcon and Camel, and 
support of Falcon commands directly in Karaf. (update to ActiveMQ 5.9.0 
is in progress too).

Regards
JB

On 01/09/2014 11:16 AM, Srikanth Sundarrajan wrote:
>
>
>
> Hi Everyone,    We have made good progress on the Falcon project since its incubation. We have had an initial release (0.3-incubating), following which we have added support for Hadoop 2.0, Hcatalog integration, hive execution engine. Venkatesh Seetharam has recently called for vote for release of 0.4-incubating with these features. We are now actively adding security features besides a number of operability improvements, all of which should go out in 0.5-incubating release in near future. At this juncture, I wanted to get the thoughts from the community on the following feature addition to Falcon over the next few releases.
> 1. Falcon Administration and Monitoring Dashboard (Umbrella JIRA: FALCON-37)2. Support for Falcon Process Designer (Umberlla JIRA: FALCON-253)3. Support stream abstractions and allow for streaming processing through Falcon over Apache Storm
> RegardsSrikanth Sundarrajan
>   		 	   		
>

-- 
Jean-Baptiste Onofré
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

RE: [DISCUSS] Falcon Roadmap

Posted by Srikanth Sundarrajan <sr...@hotmail.com>.
Makes sense. I will update the project wiki with these. Would also request @jb to provide more details about data anonymity use case that he had mentioned earlier in this thread.
RegardsSrikanth Sundarrajan

> Date: Thu, 9 Jan 2014 13:36:49 -0800
> Subject: Re: [DISCUSS] Falcon Roadmap
> From: venkatesh@innerzeal.com
> To: dev@falcon.incubator.apache.org
> 
> +1 Srikanth. I'd love to add couple more:
> 
> * Audit and Lineage
> * Data ingest since its a barrier to entry for most users
> 
> 
> On Thu, Jan 9, 2014 at 2:16 AM, Srikanth Sundarrajan <sr...@hotmail.com>wrote:
> 
> >
> >
> >
> > Hi Everyone,    We have made good progress on the Falcon project since its
> > incubation. We have had an initial release (0.3-incubating), following
> > which we have added support for Hadoop 2.0, Hcatalog integration, hive
> > execution engine. Venkatesh Seetharam has recently called for vote for
> > release of 0.4-incubating with these features. We are now actively adding
> > security features besides a number of operability improvements, all of
> > which should go out in 0.5-incubating release in near future. At this
> > juncture, I wanted to get the thoughts from the community on the following
> > feature addition to Falcon over the next few releases.
> > 1. Falcon Administration and Monitoring Dashboard (Umbrella JIRA:
> > FALCON-37)2. Support for Falcon Process Designer (Umberlla JIRA:
> > FALCON-253)3. Support stream abstractions and allow for streaming
> > processing through Falcon over Apache Storm
> > RegardsSrikanth Sundarrajan
> >
> 
> 
> 
> 
> -- 
> Regards,
> Venkatesh
> 
> “Perfection (in design) is achieved not when there is nothing more to add,
> but rather when there is nothing more to take away.”
> - Antoine de Saint-Exupéry
 		 	   		  

Re: [DISCUSS] Falcon Roadmap

Posted by Seetharam Venkatesh <ve...@innerzeal.com>.
+1 Srikanth. I'd love to add couple more:

* Audit and Lineage
* Data ingest since its a barrier to entry for most users


On Thu, Jan 9, 2014 at 2:16 AM, Srikanth Sundarrajan <sr...@hotmail.com>wrote:

>
>
>
> Hi Everyone,    We have made good progress on the Falcon project since its
> incubation. We have had an initial release (0.3-incubating), following
> which we have added support for Hadoop 2.0, Hcatalog integration, hive
> execution engine. Venkatesh Seetharam has recently called for vote for
> release of 0.4-incubating with these features. We are now actively adding
> security features besides a number of operability improvements, all of
> which should go out in 0.5-incubating release in near future. At this
> juncture, I wanted to get the thoughts from the community on the following
> feature addition to Falcon over the next few releases.
> 1. Falcon Administration and Monitoring Dashboard (Umbrella JIRA:
> FALCON-37)2. Support for Falcon Process Designer (Umberlla JIRA:
> FALCON-253)3. Support stream abstractions and allow for streaming
> processing through Falcon over Apache Storm
> RegardsSrikanth Sundarrajan
>




-- 
Regards,
Venkatesh

“Perfection (in design) is achieved not when there is nothing more to add,
but rather when there is nothing more to take away.”
- Antoine de Saint-Exupéry