You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by Chris K Wensel <ch...@wensel.net> on 2011/11/01 16:35:02 UTC

Re: Has anyone tried Spark with Mahout?

I've made a few comments on the differences here.

http://www.quora.com/Apache-Hadoop/What-are-the-differences-between-Crunch-and-Cascading/answer/Chris-K-Wensel

chris

On Oct 31, 2011, at 2:44 PM, Ted Dunning wrote:

> +Chris Wensel
> 
> The biggest difference between Cascading and Plume/Crunch/FlumeJava is that the latter all do more lazy evaluation and more program restructuring and much less large scale scheduling.  Certainly the PCFJ group do much more to make the results look like a java collection and are better at talking to conventional java types.
> 
> I think that Cascading could do the more extensive job graph rewrites.  It would be hard for Cascading to generalize its data structures, though without major backward compatibility issues.  
> 
> In sum, I think that the difference between Cascading and PCFJ is largely a matter of taste, not inherent system design.
> 
> 
> On Mon, Oct 31, 2011 at 2:36 PM, Charles Earl <ch...@me.com> wrote:
> Thanks. This is an insightful discussion. Having just glanced now at both Plume and Crunch these seem similar to Cascading in the sense of being dataflow languages. I wonder are you able to comment on if there are important distinctions.

--
Chris K Wensel
chris@concurrentinc.com
http://www.concurrentinc.com

-- Concurrent, Inc. offers mentoring, support for Cascading