You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@crunch.apache.org by Danny Morgan <un...@hotmail.com> on 2015/01/25 19:02:11 UTC

Google Cloud Dataflow?

So should I start porting my crunch code over to the Cloud Dataflow sdk?
Danny 		 	   		  

Re: Google Cloud Dataflow?

Posted by Peter Dolan <pe...@nunahealth.com>.
The promise of being able to port application logic to a different
execution engine by dropping in a different Pipeline implementation was one
of the key reasons for adopting Crunch at Nuna Health.  I'm strongly
supportive of a third Pipeline implementation :)

On Sun, Jan 25, 2015 at 11:01 AM, Josh Wills <jo...@gmail.com> wrote:

> :)
>
> I never want anyone to have to rewrite code in order to pick up and move
> it to a different execution engine. At the very least, we should write a
> wrapper that lets you run existing o.a.c.DoFn subclasses in Dataflow
> pipelines, and maybe even a DataflowPipeline implementation to port
> existing Crunch pipelines over to Dataflow once it's easier to bring Hadoop
> Input/OutputFormats over.
>
> I have some thoughts on why I think Dataflow is interesting, esp. to
> developers who are familiar with Crunch/Spark/Scalding, but I'll send them
> out in a different email b/c they're getting kind of long.
>
> J
>
> On Sun, Jan 25, 2015 at 10:02 AM, Danny Morgan <un...@hotmail.com>
> wrote:
>
>> So should I start porting my crunch code over to the Cloud Dataflow sdk?
>>
>> Danny
>>
>
>

Re: Google Cloud Dataflow?

Posted by Josh Wills <jo...@gmail.com>.
:)

I never want anyone to have to rewrite code in order to pick up and move it
to a different execution engine. At the very least, we should write a
wrapper that lets you run existing o.a.c.DoFn subclasses in Dataflow
pipelines, and maybe even a DataflowPipeline implementation to port
existing Crunch pipelines over to Dataflow once it's easier to bring Hadoop
Input/OutputFormats over.

I have some thoughts on why I think Dataflow is interesting, esp. to
developers who are familiar with Crunch/Spark/Scalding, but I'll send them
out in a different email b/c they're getting kind of long.

J

On Sun, Jan 25, 2015 at 10:02 AM, Danny Morgan <un...@hotmail.com>
wrote:

> So should I start porting my crunch code over to the Cloud Dataflow sdk?
>
> Danny
>