You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@crunch.apache.org by "Gabriel Reid (JIRA)" <ji...@apache.org> on 2014/09/11 13:16:34 UTC

[jira] [Commented] (CRUNCH-470) Add hdfs/yarn minicluster crunch pipeline

    [ https://issues.apache.org/jira/browse/CRUNCH-470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14129890#comment-14129890 ] 

Gabriel Reid commented on CRUNCH-470:
-------------------------------------

Do you mean the addition of a new Pipeline implementation (in addition to MemPipeline, MRPipeline, and SparkPipeline)? The MRPipeline implementation will already run on YARN as long as Crunch is compiled for hadoop2, so there shouldn't be a new Pipeline impl needed for this.

On the other hand, if you're referring to testing pipelines on a pseudo-distributed mini cluster, that is already possible -- this is what's actually done in the HFileTargetIT integration test, a mini-cluster (with HDFS, etc) is spun up and the pipeline is run there.

> Add hdfs/yarn minicluster crunch pipeline
> -----------------------------------------
>
>                 Key: CRUNCH-470
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-470
>             Project: Crunch
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.8.3
>            Reporter: Rafal Wojdyla
>            Assignee: Josh Wills
>            Priority: Minor
>
> Crunch currently has two pipelines:
> * MemPipeline
> * MRPipeline
> MemPipeline is in-memory pipelines based on local in-memory mapreduce mode.
> MRPipeline is distributed pipeline based on distributed MapReduce.
> Using HDFS/YARN Minicluster it's possible to better emulate Hadoop cluster, and it could be a 'final test' before running on the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)