You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@falcon.apache.org by Jean-Baptiste Onofré <jb...@nanthrax.net> on 2015/02/03 15:51:38 UTC

Falcon with Spark ?

Hi all,

I'm working (and finally resuming my work ;)) on some Falcon features:
- Update and improvements on the ActiveMQ broker
- Complete CDC support of diff/gap storage
- Support of more workflow entities (mapreduce directly instead of Oozie 
workflow.xml, etc)

For the workflow entities, I would like to evaluate the "direct" support 
of Spark.
Generally speaking, I wonder if "oppositionally", we couldn't leverage 
Spark for some internal Falcon processes (like eviction, etc).

WDYT ?

Regards
JB
-- 
Jean-Baptiste Onofré
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Re: Falcon with Spark ?

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
Yes, it's what I saw.

For the heavy dependency, that's why I defined "optional": the user has 
to specify it and use a specific configuration (knowing what he does).

For the OOZIE, I agree but again it requires a workflow.xml for Oozie. 
My plan is to avoid for the users to provide a workflow.xml, and instead 
be able to use the process configuration to define the job to run 
(directly MapReduce, spark, etc).

Regards
JB

On 02/03/2015 06:30 PM, Srikanth Sundarrajan wrote:
> Yes support for Spark was specifically added through https://issues.apache.org/jira/browse/OOZIE-1983, to allow users of Oozie or Falcon to run Spark jobs. Moving the retention job to Spark would create a heavy dependency on spark within Falcon.
>
> With https://issues.apache.org/jira/browse/FALCON-965, it should be possible to create an alternate implementation of eviction.
>
> Regards
> Srikanth Sundarrajan
>
>> Date: Tue, 3 Feb 2015 15:51:38 +0100
>> From: jb@nanthrax.net
>> To: dev@falcon.incubator.apache.org
>> Subject: Falcon with Spark ?
>>
>> Hi all,
>>
>> I'm working (and finally resuming my work ;)) on some Falcon features:
>> - Update and improvements on the ActiveMQ broker
>> - Complete CDC support of diff/gap storage
>> - Support of more workflow entities (mapreduce directly instead of Oozie
>> workflow.xml, etc)
>>
>> For the workflow entities, I would like to evaluate the "direct" support
>> of Spark.
>> Generally speaking, I wonder if "oppositionally", we couldn't leverage
>> Spark for some internal Falcon processes (like eviction, etc).
>>
>> WDYT ?
>>
>> Regards
>> JB
>> --
>> Jean-Baptiste Onofré
>> jbonofre@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>   		 	   		
>

-- 
Jean-Baptiste Onofré
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

RE: Falcon with Spark ?

Posted by Srikanth Sundarrajan <sr...@hotmail.com>.
Yes support for Spark was specifically added through https://issues.apache.org/jira/browse/OOZIE-1983, to allow users of Oozie or Falcon to run Spark jobs. Moving the retention job to Spark would create a heavy dependency on spark within Falcon. 

With https://issues.apache.org/jira/browse/FALCON-965, it should be possible to create an alternate implementation of eviction.

Regards
Srikanth Sundarrajan

> Date: Tue, 3 Feb 2015 15:51:38 +0100
> From: jb@nanthrax.net
> To: dev@falcon.incubator.apache.org
> Subject: Falcon with Spark ?
> 
> Hi all,
> 
> I'm working (and finally resuming my work ;)) on some Falcon features:
> - Update and improvements on the ActiveMQ broker
> - Complete CDC support of diff/gap storage
> - Support of more workflow entities (mapreduce directly instead of Oozie 
> workflow.xml, etc)
> 
> For the workflow entities, I would like to evaluate the "direct" support 
> of Spark.
> Generally speaking, I wonder if "oppositionally", we couldn't leverage 
> Spark for some internal Falcon processes (like eviction, etc).
> 
> WDYT ?
> 
> Regards
> JB
> -- 
> Jean-Baptiste Onofré
> jbonofre@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com