You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@bigtop.apache.org by "jay vyas (JIRA)" <ji...@apache.org> on 2014/01/08 23:00:50 UTC

[jira] [Created] (BIGTOP-1177) Puppet Recipes: Can we modularize them?

jay vyas created BIGTOP-1177:
--------------------------------

Summary: Puppet Recipes: Can we modularize them?
Key: BIGTOP-1177
URL: https://issues.apache.org/jira/browse/BIGTOP-1177
Project: Bigtop
Issue Type: Improvement
Reporter: jay vyas

In the spirit of interoperability Can we work to modularizing the bigtop puppet recipes to not define "hadoop_cluster_node" as an HDFS specific class.

I'm not a puppet expert but just

Here are two reasons why:

- For HDFS USers: In some use cases we might want to use bigtop to provision many nodes, only some of which are "data nodes". For example: Lets say our cluster is crawling the web in mappers, and doing some machine learning and distillling large pages into a small relational database tuple, i.e. that summarizes the "entities" in the page. In this case we don't necessarily benefit much from locality because we might be CPU rather than network/io bound. So we might want to provision a cluster of 50 machines : 40 multicore CPU heavy ones and just 10 datanodes to support the DFS. I know this is an extreme case but its a good example.

- For NON-HDFS users: One important aspect of emerging hadoop workflows is HCFS : https://wiki.apache.org/hadoop/HCFS/ -- the idea that filesystems like S3, OrangeFS, GlusterFileSystem, etc.. are all just as capable , although not necessarily optimal, of supporting YARN and Hadoop operations as HDFS.

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)