You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Rakhi Khatwani <ra...@gmail.com> on 2009/06/19 12:36:16 UTC

Multicluster Communication

Hi,
     Just wanted to know if multicluster communication is possible in hadoop
for example i have 10 nodes.

Hadoop cluster1
node1 - Master 1
node2 - slave of master1
node3 - slave of master1
node4 - slave of master1
node5 - slave of master1


Hadoop cluster 2
node6 - Master 2
node7 - slave of master2
node8 - slave of master2
node9 - slave of master2
node10 - slave of master2


we want hadoop cluster 1 for collecting data n storing it in HDFS
we want hadoop cluster 2 for using the stored data from HDFS and analysing
it.


Regards,
Raakhi

Re: Multicluster Communication

Posted by Rakhi Khatwani <ra...@gmail.com>.
Hi Harish,

I want both of them 2 b compute clusters. but yea... how wud they hv a
common storage area

we basically want to seperate the collection from analysis. is it possible
to dedicate a set of nodes in the hadoop cluster only for collections and
another set of nodes in the same cluster only for analysis?

Regards
Raakhi

On Fri, Jun 19, 2009 at 4:19 PM, Harish Mallipeddi <
harish.mallipeddi@gmail.com> wrote:

> On Fri, Jun 19, 2009 at 4:06 PM, Rakhi Khatwani <rakhi.khatwani@gmail.com
> >wrote:
>
> >
> > we want hadoop cluster 1 for collecting data n storing it in HDFS
> > we want hadoop cluster 2 for using the stored data from HDFS and
> analysing
> > it.
> >
>
> Why do you want to do this in the first place? It seems like you want
> cluster1 to be a plain HDFS cluster and cluster2 to be a mapred cluster.
> Doing something like that will be disastrous - Hadoop is all about sending
> computation closer to your data. If you don't want that, you need not even
> use hadoop.
>
>
> --
> Harish Mallipeddi
> http://blog.poundbang.in
>

Re: Multicluster Communication

Posted by Harish Mallipeddi <ha...@gmail.com>.
On Fri, Jun 19, 2009 at 10:37 PM, Allen Wittenauer <aw...@yahoo-inc.com> wrote:

> On 6/19/09 3:49 AM, "Harish Mallipeddi" <ha...@gmail.com>
> wrote:
> > Why do you want to do this in the first place? It seems like you want
> > cluster1 to be a plain HDFS cluster and cluster2 to be a mapred cluster.
> > Doing something like that will be disastrous - Hadoop is all about
> sending
> > computation closer to your data. If you don't want that, you need not
> even
> > use hadoop.
>
>     Given some of the limitations with HDFS (quota operability, security),
> I
> can easily why it would be desirable to have static data coming from one
> grid while doing computation/intermediate outputs/real output to another.
>
>    Using performance as your sole metric of viability is a bigger disaster
> waiting to happen.  "Sure, we crashed the file system, but look how fast it
> went down in flames!"
>
>
Well apart from doing a distcp between the 2 clusters periodically, I don't
see how this can be done in a way that would yield acceptable performance.

-- 
Harish Mallipeddi
http://blog.poundbang.in

Re: Multicluster Communication

Posted by Allen Wittenauer <aw...@yahoo-inc.com>.
On 6/19/09 3:49 AM, "Harish Mallipeddi" <ha...@gmail.com> wrote:
> Why do you want to do this in the first place? It seems like you want
> cluster1 to be a plain HDFS cluster and cluster2 to be a mapred cluster.
> Doing something like that will be disastrous - Hadoop is all about sending
> computation closer to your data. If you don't want that, you need not even
> use hadoop.

    Given some of the limitations with HDFS (quota operability, security), I
can easily why it would be desirable to have static data coming from one
grid while doing computation/intermediate outputs/real output to another.

    Using performance as your sole metric of viability is a bigger disaster
waiting to happen.  "Sure, we crashed the file system, but look how fast it
went down in flames!"


Re: Multicluster Communication

Posted by Harish Mallipeddi <ha...@gmail.com>.
On Fri, Jun 19, 2009 at 4:06 PM, Rakhi Khatwani <ra...@gmail.com>wrote:

>
> we want hadoop cluster 1 for collecting data n storing it in HDFS
> we want hadoop cluster 2 for using the stored data from HDFS and analysing
> it.
>

Why do you want to do this in the first place? It seems like you want
cluster1 to be a plain HDFS cluster and cluster2 to be a mapred cluster.
Doing something like that will be disastrous - Hadoop is all about sending
computation closer to your data. If you don't want that, you need not even
use hadoop.


-- 
Harish Mallipeddi
http://blog.poundbang.in