You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@falcon.apache.org by Sharad Agarwal <sh...@apache.org> on 2015/02/11 09:38:59 UTC

Streaming Feed

I am looking for a generic schema aware feed construct for streaming
workflow. The schema can be managed by a catalog service like HCatalog. The
streaming workflow executor would be a system like
Storm/SparkStreaming/Samza.

I want to know if this is the right thing to be supported in Falcon and if
yes what is the plugging interface for that. Would this be a new
implementation of workflow engine ?

Thanks
Sharad

Re: Streaming Feed

Posted by Sharad Agarwal <sh...@apache.org>.
Sounds good. I think we should talk more about this in our next
contributor's meetup.


On Thu, Feb 12, 2015 at 9:20 PM, Srikanth Sundarrajan <sr...@hotmail.com>
wrote:

> This is been an idea lingering in my mind for a while. I will be very
> supportive of any effort to create a stream abstract similar in lines with
> feed (or this may not be required, if we do a major overhaul with respect
> to orchestration in falcon, where tight requirement of feed having a
> frequency is done away with) and have process work with these streams. In
> which case the orchestration should happen through Nimbus or Spark Master
> instead of Oozie.
>
> In other words:
> * Feed/Stream to be a primitive entity in falcon which declares that there
> is a continuous flow of data as per schema and is not bound to any arrival
> periodicity
> * Replication/Mirroring on this would essentially use standard data
> transport mechanisms to ship data also on a streaming fashion
> * Processes that are defined over these continuous streams are to be
> orchestrated over an appropriate engine such as Nimbus (in case of Storm)
> or similar system. Processes that are defined in this way also doesn't have
> periodicity and are continuous.
>
> This topic requires more conversation before we figure the way forward. Am
> assuming, more than one of us are thinking about this.
>
> Regards
> Srikanth Sundarrajan
>
> > Date: Wed, 11 Feb 2015 15:30:47 +0530
> > Subject: Re: Streaming Feed
> > From: sharad@apache.org
> > To: dev@falcon.apache.org
> >
> > Thanks Jean, this will be quite useful. I am wondering if this will
> require
> > a new partitioning construct in the feed as well like micro-batches, etc.
> >
> > Sharad
> >
> > On Wed, Feb 11, 2015 at 2:34 PM, Jean-Baptiste Onofré <jb...@nanthrax.net>
> > wrote:
> >
> > > Hi Sharad,
> > >
> > > I sent an e-mail last week about support of Spark (SparkStreaming) in
> > > workflow/process. It's basically very close to what you propose.
> > >
> > > IMHO, it should be a new impl of workflow or at least the support of a
> new
> > > kind of processes (it's what I have in mind).
> > >
> > > Regards
> > > JB
> > >
> > >
> > > On 02/11/2015 09:38 AM, Sharad Agarwal wrote:
> > >
> > >> I am looking for a generic schema aware feed construct for streaming
> > >> workflow. The schema can be managed by a catalog service like
> HCatalog.
> > >> The
> > >> streaming workflow executor would be a system like
> > >> Storm/SparkStreaming/Samza.
> > >>
> > >> I want to know if this is the right thing to be supported in Falcon
> and if
> > >> yes what is the plugging interface for that. Would this be a new
> > >> implementation of workflow engine ?
> > >>
> > >> Thanks
> > >> Sharad
> > >>
> > >>
> > > --
> > > Jean-Baptiste Onofré
> > > jbonofre@apache.org
> > > http://blog.nanthrax.net
> > > Talend - http://www.talend.com
> > >
>
>

RE: Streaming Feed

Posted by Srikanth Sundarrajan <sr...@hotmail.com>.
This is been an idea lingering in my mind for a while. I will be very supportive of any effort to create a stream abstract similar in lines with feed (or this may not be required, if we do a major overhaul with respect to orchestration in falcon, where tight requirement of feed having a frequency is done away with) and have process work with these streams. In which case the orchestration should happen through Nimbus or Spark Master instead of Oozie.

In other words:
* Feed/Stream to be a primitive entity in falcon which declares that there is a continuous flow of data as per schema and is not bound to any arrival periodicity
* Replication/Mirroring on this would essentially use standard data transport mechanisms to ship data also on a streaming fashion
* Processes that are defined over these continuous streams are to be orchestrated over an appropriate engine such as Nimbus (in case of Storm) or similar system. Processes that are defined in this way also doesn't have periodicity and are continuous. 

This topic requires more conversation before we figure the way forward. Am assuming, more than one of us are thinking about this.

Regards
Srikanth Sundarrajan

> Date: Wed, 11 Feb 2015 15:30:47 +0530
> Subject: Re: Streaming Feed
> From: sharad@apache.org
> To: dev@falcon.apache.org
> 
> Thanks Jean, this will be quite useful. I am wondering if this will require
> a new partitioning construct in the feed as well like micro-batches, etc.
> 
> Sharad
> 
> On Wed, Feb 11, 2015 at 2:34 PM, Jean-Baptiste Onofré <jb...@nanthrax.net>
> wrote:
> 
> > Hi Sharad,
> >
> > I sent an e-mail last week about support of Spark (SparkStreaming) in
> > workflow/process. It's basically very close to what you propose.
> >
> > IMHO, it should be a new impl of workflow or at least the support of a new
> > kind of processes (it's what I have in mind).
> >
> > Regards
> > JB
> >
> >
> > On 02/11/2015 09:38 AM, Sharad Agarwal wrote:
> >
> >> I am looking for a generic schema aware feed construct for streaming
> >> workflow. The schema can be managed by a catalog service like HCatalog.
> >> The
> >> streaming workflow executor would be a system like
> >> Storm/SparkStreaming/Samza.
> >>
> >> I want to know if this is the right thing to be supported in Falcon and if
> >> yes what is the plugging interface for that. Would this be a new
> >> implementation of workflow engine ?
> >>
> >> Thanks
> >> Sharad
> >>
> >>
> > --
> > Jean-Baptiste Onofré
> > jbonofre@apache.org
> > http://blog.nanthrax.net
> > Talend - http://www.talend.com
> >
 		 	   		  

Re: Streaming Feed

Posted by Sharad Agarwal <sh...@apache.org>.
Thanks Jean, this will be quite useful. I am wondering if this will require
a new partitioning construct in the feed as well like micro-batches, etc.

Sharad

On Wed, Feb 11, 2015 at 2:34 PM, Jean-Baptiste Onofré <jb...@nanthrax.net>
wrote:

> Hi Sharad,
>
> I sent an e-mail last week about support of Spark (SparkStreaming) in
> workflow/process. It's basically very close to what you propose.
>
> IMHO, it should be a new impl of workflow or at least the support of a new
> kind of processes (it's what I have in mind).
>
> Regards
> JB
>
>
> On 02/11/2015 09:38 AM, Sharad Agarwal wrote:
>
>> I am looking for a generic schema aware feed construct for streaming
>> workflow. The schema can be managed by a catalog service like HCatalog.
>> The
>> streaming workflow executor would be a system like
>> Storm/SparkStreaming/Samza.
>>
>> I want to know if this is the right thing to be supported in Falcon and if
>> yes what is the plugging interface for that. Would this be a new
>> implementation of workflow engine ?
>>
>> Thanks
>> Sharad
>>
>>
> --
> Jean-Baptiste Onofré
> jbonofre@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>

Re: Streaming Feed

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
Hi Sharad,

I sent an e-mail last week about support of Spark (SparkStreaming) in 
workflow/process. It's basically very close to what you propose.

IMHO, it should be a new impl of workflow or at least the support of a 
new kind of processes (it's what I have in mind).

Regards
JB

On 02/11/2015 09:38 AM, Sharad Agarwal wrote:
> I am looking for a generic schema aware feed construct for streaming
> workflow. The schema can be managed by a catalog service like HCatalog. The
> streaming workflow executor would be a system like
> Storm/SparkStreaming/Samza.
>
> I want to know if this is the right thing to be supported in Falcon and if
> yes what is the plugging interface for that. Would this be a new
> implementation of workflow engine ?
>
> Thanks
> Sharad
>

-- 
Jean-Baptiste Onofré
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com