You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Harold Valdivia Garcia <ha...@upr.edu> on 2009/08/09 04:28:01 UTC

How to break a hadoop-cluster in subclusters (how to group physical nodes)?

Hi, everyone. How can I split the cluster in subcluster?. I want to break my
cluster in regions (physical groups of nodes), for example I'd like to have
a region for only sorting, other for only joins, other for only groupby, and
so on..... If I submit a "x-task" the JobTracker should send this job to the
"x-region".

Is there a way to make phisical groups of TaskTrackers?

Please. Can anyone give me a hint?. Thanks for all.
Harold.

-- 
******************************************
Harold Dwight Valdivia Garcia
Graduate Student
M.S Computer Engineering
University of Puerto Rico, Mayaguez Campus
******************************************

Re: How to break a hadoop-cluster in subclusters (how to group physical nodes)?

Posted by Ted Dunning <te...@gmail.com>.
On Sun, Aug 9, 2009 at 8:17 AM, Harold Valdivia Garcia <
harold.valdivia@upr.edu> wrote:

> Ok, you mean that I could setup an instance of HDFS, then install multiple
> cluster of tasktracker with the same HDFS.?


I think so.

In this configuration as you say I'd loss data-locatily because map-task
> consume splits remotely, isnt it?


Yes.  You would also lose most of the speed of your cluster because your
different operations will occur at different times and each will only use
part of the cluster.  What you are suggesting will make your cluster slower
by a substantial factor.


> In my work, I want to execution each of the relational operations in a
> query-plan as a couple of mapreduce task and link them


This is a very common desire.  But most people do better by having as large
a cluster as possible for all operations.

Re: How to break a hadoop-cluster in subclusters (how to group physical nodes)?

Posted by Harold Valdivia Garcia <ha...@upr.edu>.
Ok, you mean that I could setup an instance of HDFS, then install multiple
cluster of tasktracker with the same HDFS.?

In this configuration as you say I'd loss data-locatily because map-task
consume splits remotely, isnt it?

In my work, I want to execution each of the relational operations in a
query-plan as a couple of mapreduce task and link them

Thanks for your comment.

On Sun, Aug 9, 2009 at 12:52 AM, Ted Dunning <te...@gmail.com> wrote:

> Why?
>
> I would imagine that you could create multiple clusters of TaskTrackers
> each
> associated with a single JobTracker all of which would use the same data
> cluster composed of a NameNode plus data nodes.
>
> But what do you think that would buy you?  Mostly like you will simply wind
> up with much lower cluster utilization combined with configuration
> headaches.
>
> On Sat, Aug 8, 2009 at 7:28 PM, Harold Valdivia Garcia <
> harold.valdivia@upr.edu> wrote:
>
> > for example I'd like to have a region for only sorting, other for only
> > joins, other for only groupby
> >
>
>
>
> --
> Ted Dunning, CTO
> DeepDyve
>



-- 
******************************************
Harold Dwight Valdivia Garcia
Graduate Student
M.S Computer Engineering
University of Puerto Rico, Mayaguez Campus
******************************************

Re: How to break a hadoop-cluster in subclusters (how to group physical nodes)?

Posted by Ted Dunning <te...@gmail.com>.
Why?

I would imagine that you could create multiple clusters of TaskTrackers each
associated with a single JobTracker all of which would use the same data
cluster composed of a NameNode plus data nodes.

But what do you think that would buy you?  Mostly like you will simply wind
up with much lower cluster utilization combined with configuration
headaches.

On Sat, Aug 8, 2009 at 7:28 PM, Harold Valdivia Garcia <
harold.valdivia@upr.edu> wrote:

> for example I'd like to have a region for only sorting, other for only
> joins, other for only groupby
>



-- 
Ted Dunning, CTO
DeepDyve