You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Baskar Duraikannu <ba...@outlook.com> on 2013/08/29 21:12:11 UTC

Multidata center support

We have a need to setup hadoop across data centers.  Does hadoop support multi data center configuration? I searched through archives and have found that hadoop did not support multi data center configuration some time back. Just wanted to see whether situation has changed.
Please help. 		 	   		  

RE: Multidata center support

Posted by Baskar Duraikannu <ba...@outlook.com>.
Currently there is no relation betweeen weak consistency and hadoop. I just spent more time thinking about the requirement (as outlined below)     a) Maintain total of 3 data centers     b) Maintain 1 copy per data center     c) If any data center goes down, dont create additional copies.  
Above is not a valid model, especially requirement (c).  Because this will take away "Strong Consistency" model supported by Hadoop. Hope this explains. 
I believe we can give up on requirement (c). I more currently exploring to see whether anyway to achieve (a) and (b). Requirement (b) can also be relaxed to have more copies per data center if needed 
From: rahul.rec.dgp@gmail.com
Date: Wed, 4 Sep 2013 10:04:49 +0530
Subject: Re: Multidata center support
To: user@hadoop.apache.org

Under replicated blocks are also consistent from a consumers point. Care of explain the relation to weak consistency to hadoop.



Thanks,
Rahul


On Wed, Sep 4, 2013 at 9:56 AM, Rahul Bhattacharjee <ra...@gmail.com> wrote:


Adam's response makes more sense to me to offline replicate generated data from one cluster to another across data centers.




Not sure if configurable block placement block placement policy is supported in Hadoop.If yes , then alone side with rack awareness , you should be able to achieve the same.




I could not follow your question related to weak consistency.


Thanks,
Rahul






On Wed, Sep 4, 2013 at 2:20 AM, Baskar Duraikannu <ba...@outlook.com> wrote:






RahulAre you talking about rack-awareness script? 
I did go through rack awareness. Here are the problems with rack awareness w.r.to my (given) "business requirment"



1.  Hadoop , default places two copies on the same rack and 1 copy on some other rack.  This would work as long as we have two data centers. if business wants to have three data centers, then data would not be spread across. Separately there is a question around whether it is the right thing to do or not. I have been promised by business that they would buy enough bandwidth such that each data center will be few milliseconds apart (in latency).



2. I believe Hadoop automatically re-replicates data if one or more node is down. Assume when one out of 2 data center goes down. There will be a massive data flow to create additional copies.  When I say data center support, I should be able to configure hadoop to say 


     a) Maintain 1 copy per data center     b) If any data center goes down, dont create additional copies.  
Above requirements that I am pointing will essentially move hadoop from strongly consistent to a week/eventual consistent model. Since this changes fundamental architecture, it will probably break all sort of things... Might not be possible ever in Hadoop. 



Thoughts? 
SadakIs there a way to implement above requirement via Federation? 
ThanksBaskar

Date: Sun, 1 Sep 2013 00:20:04 +0530



Subject: Re: Multidata center support
From: visioner.sadak@gmail.com
To: user@hadoop.apache.org




What do you think friends I think hadoop clusters can run on multiple data centers using FEDERATION

On Sat, Aug 31, 2013 at 8:39 PM, Visioner Sadak <vi...@gmail.com> wrote:




The only problem i guess hadoop wont be able to duplicate data from one data center to another but i guess i can identify data nodes or namenodes from another data center correct me if i am wrong






On Sat, Aug 31, 2013 at 7:00 PM, Visioner Sadak <vi...@gmail.com> wrote:





lets say that 
you have some machines in europe and some  in US I think you just need the ips and configure them in your cluster set upit will work...



On Sat, Aug 31, 2013 at 7:52 AM, Jun Ping Du <jd...@vmware.com> wrote:






Hi,    Although you can set datacenter layer on your network topology, it is never enabled in hadoop as lacking of replica placement and task scheduling support. There are some work to add layers other than rack and node under HADOOP-8848 but may not suit for your case. Agree with Adam that a cluster spanning multiple data centers seems not make sense even for DR case. Do you have other cases to do such a deployment?






Thanks,
Junping
From: "Adam Muise" <am...@hortonworks.com>






To: user@hadoop.apache.org
Sent: Friday, August 30, 2013 6:26:54 PM
Subject: Re: Multidata center support


Nothing has changed. DR best practice is still one (or more) clusters per site and replication is handled via distributed copy or some variation of it. A cluster spanning multiple data centers is a poor idea right now.










On Fri, Aug 30, 2013 at 12:35 AM, Rahul Bhattacharjee <ra...@gmail.com> wrote:







My take on this.





Why hadoop has to know about data center thing. I think it can be installed across multiple data centers , however topology configuration would be required to tell which node belongs to which data center and switch for block placement.










Thanks,
Rahul


On Fri, Aug 30, 2013 at 12:42 AM, Baskar Duraikannu <ba...@outlook.com> wrote:












We have a need to setup hadoop across data centers.  Does hadoop support multi data center configuration? I searched through archives and have found that hadoop did not support multi data center configuration some time back. Just wanted to see whether situation has changed.









Please help. 		 	   		  




-- 









Adam MuiseSolution EngineerHortonworks






amuise@hortonworks.com416-417-4037







Hortonworks - Develops, Distributes and Supports Enterprise Apache Hadoop.








Hortonworks Virtual Sandbox








Hadoop: Disruptive Possibilities by Jeff Needham











CONFIDENTIALITY NOTICENOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.











 		 	   		  



 		 	   		  

RE: Multidata center support

Posted by Baskar Duraikannu <ba...@outlook.com>.
Currently there is no relation betweeen weak consistency and hadoop. I just spent more time thinking about the requirement (as outlined below)     a) Maintain total of 3 data centers     b) Maintain 1 copy per data center     c) If any data center goes down, dont create additional copies.  
Above is not a valid model, especially requirement (c).  Because this will take away "Strong Consistency" model supported by Hadoop. Hope this explains. 
I believe we can give up on requirement (c). I more currently exploring to see whether anyway to achieve (a) and (b). Requirement (b) can also be relaxed to have more copies per data center if needed 
From: rahul.rec.dgp@gmail.com
Date: Wed, 4 Sep 2013 10:04:49 +0530
Subject: Re: Multidata center support
To: user@hadoop.apache.org

Under replicated blocks are also consistent from a consumers point. Care of explain the relation to weak consistency to hadoop.



Thanks,
Rahul


On Wed, Sep 4, 2013 at 9:56 AM, Rahul Bhattacharjee <ra...@gmail.com> wrote:


Adam's response makes more sense to me to offline replicate generated data from one cluster to another across data centers.




Not sure if configurable block placement block placement policy is supported in Hadoop.If yes , then alone side with rack awareness , you should be able to achieve the same.




I could not follow your question related to weak consistency.


Thanks,
Rahul






On Wed, Sep 4, 2013 at 2:20 AM, Baskar Duraikannu <ba...@outlook.com> wrote:






RahulAre you talking about rack-awareness script? 
I did go through rack awareness. Here are the problems with rack awareness w.r.to my (given) "business requirment"



1.  Hadoop , default places two copies on the same rack and 1 copy on some other rack.  This would work as long as we have two data centers. if business wants to have three data centers, then data would not be spread across. Separately there is a question around whether it is the right thing to do or not. I have been promised by business that they would buy enough bandwidth such that each data center will be few milliseconds apart (in latency).



2. I believe Hadoop automatically re-replicates data if one or more node is down. Assume when one out of 2 data center goes down. There will be a massive data flow to create additional copies.  When I say data center support, I should be able to configure hadoop to say 


     a) Maintain 1 copy per data center     b) If any data center goes down, dont create additional copies.  
Above requirements that I am pointing will essentially move hadoop from strongly consistent to a week/eventual consistent model. Since this changes fundamental architecture, it will probably break all sort of things... Might not be possible ever in Hadoop. 



Thoughts? 
SadakIs there a way to implement above requirement via Federation? 
ThanksBaskar

Date: Sun, 1 Sep 2013 00:20:04 +0530



Subject: Re: Multidata center support
From: visioner.sadak@gmail.com
To: user@hadoop.apache.org




What do you think friends I think hadoop clusters can run on multiple data centers using FEDERATION

On Sat, Aug 31, 2013 at 8:39 PM, Visioner Sadak <vi...@gmail.com> wrote:




The only problem i guess hadoop wont be able to duplicate data from one data center to another but i guess i can identify data nodes or namenodes from another data center correct me if i am wrong






On Sat, Aug 31, 2013 at 7:00 PM, Visioner Sadak <vi...@gmail.com> wrote:





lets say that 
you have some machines in europe and some  in US I think you just need the ips and configure them in your cluster set upit will work...



On Sat, Aug 31, 2013 at 7:52 AM, Jun Ping Du <jd...@vmware.com> wrote:






Hi,    Although you can set datacenter layer on your network topology, it is never enabled in hadoop as lacking of replica placement and task scheduling support. There are some work to add layers other than rack and node under HADOOP-8848 but may not suit for your case. Agree with Adam that a cluster spanning multiple data centers seems not make sense even for DR case. Do you have other cases to do such a deployment?






Thanks,
Junping
From: "Adam Muise" <am...@hortonworks.com>






To: user@hadoop.apache.org
Sent: Friday, August 30, 2013 6:26:54 PM
Subject: Re: Multidata center support


Nothing has changed. DR best practice is still one (or more) clusters per site and replication is handled via distributed copy or some variation of it. A cluster spanning multiple data centers is a poor idea right now.










On Fri, Aug 30, 2013 at 12:35 AM, Rahul Bhattacharjee <ra...@gmail.com> wrote:







My take on this.





Why hadoop has to know about data center thing. I think it can be installed across multiple data centers , however topology configuration would be required to tell which node belongs to which data center and switch for block placement.










Thanks,
Rahul


On Fri, Aug 30, 2013 at 12:42 AM, Baskar Duraikannu <ba...@outlook.com> wrote:












We have a need to setup hadoop across data centers.  Does hadoop support multi data center configuration? I searched through archives and have found that hadoop did not support multi data center configuration some time back. Just wanted to see whether situation has changed.









Please help. 		 	   		  




-- 









Adam MuiseSolution EngineerHortonworks






amuise@hortonworks.com416-417-4037







Hortonworks - Develops, Distributes and Supports Enterprise Apache Hadoop.








Hortonworks Virtual Sandbox








Hadoop: Disruptive Possibilities by Jeff Needham











CONFIDENTIALITY NOTICENOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.











 		 	   		  



 		 	   		  

RE: Multidata center support

Posted by Baskar Duraikannu <ba...@outlook.com>.
Currently there is no relation betweeen weak consistency and hadoop. I just spent more time thinking about the requirement (as outlined below)     a) Maintain total of 3 data centers     b) Maintain 1 copy per data center     c) If any data center goes down, dont create additional copies.  
Above is not a valid model, especially requirement (c).  Because this will take away "Strong Consistency" model supported by Hadoop. Hope this explains. 
I believe we can give up on requirement (c). I more currently exploring to see whether anyway to achieve (a) and (b). Requirement (b) can also be relaxed to have more copies per data center if needed 
From: rahul.rec.dgp@gmail.com
Date: Wed, 4 Sep 2013 10:04:49 +0530
Subject: Re: Multidata center support
To: user@hadoop.apache.org

Under replicated blocks are also consistent from a consumers point. Care of explain the relation to weak consistency to hadoop.



Thanks,
Rahul


On Wed, Sep 4, 2013 at 9:56 AM, Rahul Bhattacharjee <ra...@gmail.com> wrote:


Adam's response makes more sense to me to offline replicate generated data from one cluster to another across data centers.




Not sure if configurable block placement block placement policy is supported in Hadoop.If yes , then alone side with rack awareness , you should be able to achieve the same.




I could not follow your question related to weak consistency.


Thanks,
Rahul






On Wed, Sep 4, 2013 at 2:20 AM, Baskar Duraikannu <ba...@outlook.com> wrote:






RahulAre you talking about rack-awareness script? 
I did go through rack awareness. Here are the problems with rack awareness w.r.to my (given) "business requirment"



1.  Hadoop , default places two copies on the same rack and 1 copy on some other rack.  This would work as long as we have two data centers. if business wants to have three data centers, then data would not be spread across. Separately there is a question around whether it is the right thing to do or not. I have been promised by business that they would buy enough bandwidth such that each data center will be few milliseconds apart (in latency).



2. I believe Hadoop automatically re-replicates data if one or more node is down. Assume when one out of 2 data center goes down. There will be a massive data flow to create additional copies.  When I say data center support, I should be able to configure hadoop to say 


     a) Maintain 1 copy per data center     b) If any data center goes down, dont create additional copies.  
Above requirements that I am pointing will essentially move hadoop from strongly consistent to a week/eventual consistent model. Since this changes fundamental architecture, it will probably break all sort of things... Might not be possible ever in Hadoop. 



Thoughts? 
SadakIs there a way to implement above requirement via Federation? 
ThanksBaskar

Date: Sun, 1 Sep 2013 00:20:04 +0530



Subject: Re: Multidata center support
From: visioner.sadak@gmail.com
To: user@hadoop.apache.org




What do you think friends I think hadoop clusters can run on multiple data centers using FEDERATION

On Sat, Aug 31, 2013 at 8:39 PM, Visioner Sadak <vi...@gmail.com> wrote:




The only problem i guess hadoop wont be able to duplicate data from one data center to another but i guess i can identify data nodes or namenodes from another data center correct me if i am wrong






On Sat, Aug 31, 2013 at 7:00 PM, Visioner Sadak <vi...@gmail.com> wrote:





lets say that 
you have some machines in europe and some  in US I think you just need the ips and configure them in your cluster set upit will work...



On Sat, Aug 31, 2013 at 7:52 AM, Jun Ping Du <jd...@vmware.com> wrote:






Hi,    Although you can set datacenter layer on your network topology, it is never enabled in hadoop as lacking of replica placement and task scheduling support. There are some work to add layers other than rack and node under HADOOP-8848 but may not suit for your case. Agree with Adam that a cluster spanning multiple data centers seems not make sense even for DR case. Do you have other cases to do such a deployment?






Thanks,
Junping
From: "Adam Muise" <am...@hortonworks.com>






To: user@hadoop.apache.org
Sent: Friday, August 30, 2013 6:26:54 PM
Subject: Re: Multidata center support


Nothing has changed. DR best practice is still one (or more) clusters per site and replication is handled via distributed copy or some variation of it. A cluster spanning multiple data centers is a poor idea right now.










On Fri, Aug 30, 2013 at 12:35 AM, Rahul Bhattacharjee <ra...@gmail.com> wrote:







My take on this.





Why hadoop has to know about data center thing. I think it can be installed across multiple data centers , however topology configuration would be required to tell which node belongs to which data center and switch for block placement.










Thanks,
Rahul


On Fri, Aug 30, 2013 at 12:42 AM, Baskar Duraikannu <ba...@outlook.com> wrote:












We have a need to setup hadoop across data centers.  Does hadoop support multi data center configuration? I searched through archives and have found that hadoop did not support multi data center configuration some time back. Just wanted to see whether situation has changed.









Please help. 		 	   		  




-- 









Adam MuiseSolution EngineerHortonworks






amuise@hortonworks.com416-417-4037







Hortonworks - Develops, Distributes and Supports Enterprise Apache Hadoop.








Hortonworks Virtual Sandbox








Hadoop: Disruptive Possibilities by Jeff Needham











CONFIDENTIALITY NOTICENOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.











 		 	   		  



 		 	   		  

RE: Multidata center support

Posted by Baskar Duraikannu <ba...@outlook.com>.
Currently there is no relation betweeen weak consistency and hadoop. I just spent more time thinking about the requirement (as outlined below)     a) Maintain total of 3 data centers     b) Maintain 1 copy per data center     c) If any data center goes down, dont create additional copies.  
Above is not a valid model, especially requirement (c).  Because this will take away "Strong Consistency" model supported by Hadoop. Hope this explains. 
I believe we can give up on requirement (c). I more currently exploring to see whether anyway to achieve (a) and (b). Requirement (b) can also be relaxed to have more copies per data center if needed 
From: rahul.rec.dgp@gmail.com
Date: Wed, 4 Sep 2013 10:04:49 +0530
Subject: Re: Multidata center support
To: user@hadoop.apache.org

Under replicated blocks are also consistent from a consumers point. Care of explain the relation to weak consistency to hadoop.



Thanks,
Rahul


On Wed, Sep 4, 2013 at 9:56 AM, Rahul Bhattacharjee <ra...@gmail.com> wrote:


Adam's response makes more sense to me to offline replicate generated data from one cluster to another across data centers.




Not sure if configurable block placement block placement policy is supported in Hadoop.If yes , then alone side with rack awareness , you should be able to achieve the same.




I could not follow your question related to weak consistency.


Thanks,
Rahul






On Wed, Sep 4, 2013 at 2:20 AM, Baskar Duraikannu <ba...@outlook.com> wrote:






RahulAre you talking about rack-awareness script? 
I did go through rack awareness. Here are the problems with rack awareness w.r.to my (given) "business requirment"



1.  Hadoop , default places two copies on the same rack and 1 copy on some other rack.  This would work as long as we have two data centers. if business wants to have three data centers, then data would not be spread across. Separately there is a question around whether it is the right thing to do or not. I have been promised by business that they would buy enough bandwidth such that each data center will be few milliseconds apart (in latency).



2. I believe Hadoop automatically re-replicates data if one or more node is down. Assume when one out of 2 data center goes down. There will be a massive data flow to create additional copies.  When I say data center support, I should be able to configure hadoop to say 


     a) Maintain 1 copy per data center     b) If any data center goes down, dont create additional copies.  
Above requirements that I am pointing will essentially move hadoop from strongly consistent to a week/eventual consistent model. Since this changes fundamental architecture, it will probably break all sort of things... Might not be possible ever in Hadoop. 



Thoughts? 
SadakIs there a way to implement above requirement via Federation? 
ThanksBaskar

Date: Sun, 1 Sep 2013 00:20:04 +0530



Subject: Re: Multidata center support
From: visioner.sadak@gmail.com
To: user@hadoop.apache.org




What do you think friends I think hadoop clusters can run on multiple data centers using FEDERATION

On Sat, Aug 31, 2013 at 8:39 PM, Visioner Sadak <vi...@gmail.com> wrote:




The only problem i guess hadoop wont be able to duplicate data from one data center to another but i guess i can identify data nodes or namenodes from another data center correct me if i am wrong






On Sat, Aug 31, 2013 at 7:00 PM, Visioner Sadak <vi...@gmail.com> wrote:





lets say that 
you have some machines in europe and some  in US I think you just need the ips and configure them in your cluster set upit will work...



On Sat, Aug 31, 2013 at 7:52 AM, Jun Ping Du <jd...@vmware.com> wrote:






Hi,    Although you can set datacenter layer on your network topology, it is never enabled in hadoop as lacking of replica placement and task scheduling support. There are some work to add layers other than rack and node under HADOOP-8848 but may not suit for your case. Agree with Adam that a cluster spanning multiple data centers seems not make sense even for DR case. Do you have other cases to do such a deployment?






Thanks,
Junping
From: "Adam Muise" <am...@hortonworks.com>






To: user@hadoop.apache.org
Sent: Friday, August 30, 2013 6:26:54 PM
Subject: Re: Multidata center support


Nothing has changed. DR best practice is still one (or more) clusters per site and replication is handled via distributed copy or some variation of it. A cluster spanning multiple data centers is a poor idea right now.










On Fri, Aug 30, 2013 at 12:35 AM, Rahul Bhattacharjee <ra...@gmail.com> wrote:







My take on this.





Why hadoop has to know about data center thing. I think it can be installed across multiple data centers , however topology configuration would be required to tell which node belongs to which data center and switch for block placement.










Thanks,
Rahul


On Fri, Aug 30, 2013 at 12:42 AM, Baskar Duraikannu <ba...@outlook.com> wrote:












We have a need to setup hadoop across data centers.  Does hadoop support multi data center configuration? I searched through archives and have found that hadoop did not support multi data center configuration some time back. Just wanted to see whether situation has changed.









Please help. 		 	   		  




-- 









Adam MuiseSolution EngineerHortonworks






amuise@hortonworks.com416-417-4037







Hortonworks - Develops, Distributes and Supports Enterprise Apache Hadoop.








Hortonworks Virtual Sandbox








Hadoop: Disruptive Possibilities by Jeff Needham











CONFIDENTIALITY NOTICENOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.











 		 	   		  



 		 	   		  

Re: Multidata center support

Posted by Rahul Bhattacharjee <ra...@gmail.com>.
Under replicated blocks are also consistent from a consumers point. Care of
explain the relation to weak consistency to hadoop.

Thanks,
Rahul


On Wed, Sep 4, 2013 at 9:56 AM, Rahul Bhattacharjee <rahul.rec.dgp@gmail.com
> wrote:

> Adam's response makes more sense to me to offline replicate generated data
> from one cluster to another across data centers.
>
> Not sure if configurable block placement block placement policy is
> supported in Hadoop.If yes , then alone side with rack awareness , you
> should be able to achieve the same.
>
> I could not follow your question related to weak consistency.
>
> Thanks,
> Rahul
>
>
>
> On Wed, Sep 4, 2013 at 2:20 AM, Baskar Duraikannu <
> baskar.duraikannu@outlook.com> wrote:
>
>> Rahul
>> Are you talking about rack-awareness script?
>>
>> I did go through rack awareness. Here are the problems with rack
>> awareness w.r.to my (given) "business requirment"
>>
>> 1.  Hadoop , default places two copies on the same rack and 1 copy on
>> some other rack.  This would work as long as we have two data centers. if
>> business wants to have three data centers, then data would not be spread
>> across. Separately there is a question around whether it is the right thing
>> to do or not. I have been promised by business that they would buy enough
>> bandwidth such that each data center will be few milliseconds apart (in
>> latency).
>>
>> 2. I believe Hadoop automatically re-replicates data if one or more node
>> is down. Assume when one out of 2 data center goes down. There will be a
>> massive data flow to create additional copies.  When I say data center
>> support, I should be able to configure hadoop to say
>>      a) Maintain 1 copy per data center
>>      b) If any data center goes down, dont create additional copies.
>>
>> Above requirements that I am pointing will essentially move hadoop from
>> strongly consistent to a week/eventual consistent model. Since this changes
>> fundamental architecture, it will probably break all sort of things...
>> Might not be possible ever in Hadoop.
>>
>> Thoughts?
>>
>> Sadak
>> Is there a way to implement above requirement via Federation?
>>
>> Thanks
>> Baskar
>>
>>
>> ------------------------------
>> Date: Sun, 1 Sep 2013 00:20:04 +0530
>>
>> Subject: Re: Multidata center support
>> From: visioner.sadak@gmail.com
>> To: user@hadoop.apache.org
>>
>>
>> What do you think friends I think hadoop clusters can run on multiple
>> data centers using FEDERATION
>>
>>
>> On Sat, Aug 31, 2013 at 8:39 PM, Visioner Sadak <visioner.sadak@gmail.com
>> > wrote:
>>
>> The only problem i guess hadoop wont be able to duplicate data from one
>> data center to another but i guess i can identify data nodes or namenodes
>> from another data center correct me if i am wrong
>>
>>
>> On Sat, Aug 31, 2013 at 7:00 PM, Visioner Sadak <visioner.sadak@gmail.com
>> > wrote:
>>
>> lets say that
>>
>> you have some machines in europe and some  in US I think you just need
>> the ips and configure them in your cluster set up
>> it will work...
>>
>>
>> On Sat, Aug 31, 2013 at 7:52 AM, Jun Ping Du <jd...@vmware.com> wrote:
>>
>> Hi,
>>     Although you can set datacenter layer on your network topology, it is
>> never enabled in hadoop as lacking of replica placement and task scheduling
>> support. There are some work to add layers other than rack and node under
>> HADOOP-8848 but may not suit for your case. Agree with Adam that a cluster
>> spanning multiple data centers seems not make sense even for DR case. Do
>> you have other cases to do such a deployment?
>>
>> Thanks,
>>
>> Junping
>>
>> ------------------------------
>> *From: *"Adam Muise" <am...@hortonworks.com>
>> *To: *user@hadoop.apache.org
>> *Sent: *Friday, August 30, 2013 6:26:54 PM
>> *Subject: *Re: Multidata center support
>>
>>
>> Nothing has changed. DR best practice is still one (or more) clusters per
>> site and replication is handled via distributed copy or some variation of
>> it. A cluster spanning multiple data centers is a poor idea right now.
>>
>>
>>
>>
>> On Fri, Aug 30, 2013 at 12:35 AM, Rahul Bhattacharjee <
>> rahul.rec.dgp@gmail.com> wrote:
>>
>> My take on this.
>>
>> Why hadoop has to know about data center thing. I think it can be
>> installed across multiple data centers , however topology configuration
>> would be required to tell which node belongs to which data center and
>> switch for block placement.
>>
>> Thanks,
>> Rahul
>>
>>
>> On Fri, Aug 30, 2013 at 12:42 AM, Baskar Duraikannu <
>> baskar.duraikannu@outlook.com> wrote:
>>
>> We have a need to setup hadoop across data centers.  Does hadoop support
>> multi data center configuration? I searched through archives and have found
>> that hadoop did not support multi data center configuration some time back.
>> Just wanted to see whether situation has changed.
>>
>> Please help.
>>
>>
>>
>>
>>
>> --
>> *
>> *
>> *
>> *
>> *Adam Muise*
>> Solution Engineer
>> *Hortonworks*
>> amuise@hortonworks.com
>> 416-417-4037
>>
>> Hortonworks - Develops, Distributes and Supports Enterprise Apache Hadoop.<http://hortonworks.com/>
>>
>> Hortonworks Virtual Sandbox <http://hortonworks.com/sandbox>
>>
>> Hadoop: Disruptive Possibilities by Jeff Needham<http://hortonworks.com/resources/?did=72&cat=1>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>>
>>
>>
>>
>>
>

Re: Multidata center support

Posted by Rahul Bhattacharjee <ra...@gmail.com>.
Under replicated blocks are also consistent from a consumers point. Care of
explain the relation to weak consistency to hadoop.

Thanks,
Rahul


On Wed, Sep 4, 2013 at 9:56 AM, Rahul Bhattacharjee <rahul.rec.dgp@gmail.com
> wrote:

> Adam's response makes more sense to me to offline replicate generated data
> from one cluster to another across data centers.
>
> Not sure if configurable block placement block placement policy is
> supported in Hadoop.If yes , then alone side with rack awareness , you
> should be able to achieve the same.
>
> I could not follow your question related to weak consistency.
>
> Thanks,
> Rahul
>
>
>
> On Wed, Sep 4, 2013 at 2:20 AM, Baskar Duraikannu <
> baskar.duraikannu@outlook.com> wrote:
>
>> Rahul
>> Are you talking about rack-awareness script?
>>
>> I did go through rack awareness. Here are the problems with rack
>> awareness w.r.to my (given) "business requirment"
>>
>> 1.  Hadoop , default places two copies on the same rack and 1 copy on
>> some other rack.  This would work as long as we have two data centers. if
>> business wants to have three data centers, then data would not be spread
>> across. Separately there is a question around whether it is the right thing
>> to do or not. I have been promised by business that they would buy enough
>> bandwidth such that each data center will be few milliseconds apart (in
>> latency).
>>
>> 2. I believe Hadoop automatically re-replicates data if one or more node
>> is down. Assume when one out of 2 data center goes down. There will be a
>> massive data flow to create additional copies.  When I say data center
>> support, I should be able to configure hadoop to say
>>      a) Maintain 1 copy per data center
>>      b) If any data center goes down, dont create additional copies.
>>
>> Above requirements that I am pointing will essentially move hadoop from
>> strongly consistent to a week/eventual consistent model. Since this changes
>> fundamental architecture, it will probably break all sort of things...
>> Might not be possible ever in Hadoop.
>>
>> Thoughts?
>>
>> Sadak
>> Is there a way to implement above requirement via Federation?
>>
>> Thanks
>> Baskar
>>
>>
>> ------------------------------
>> Date: Sun, 1 Sep 2013 00:20:04 +0530
>>
>> Subject: Re: Multidata center support
>> From: visioner.sadak@gmail.com
>> To: user@hadoop.apache.org
>>
>>
>> What do you think friends I think hadoop clusters can run on multiple
>> data centers using FEDERATION
>>
>>
>> On Sat, Aug 31, 2013 at 8:39 PM, Visioner Sadak <visioner.sadak@gmail.com
>> > wrote:
>>
>> The only problem i guess hadoop wont be able to duplicate data from one
>> data center to another but i guess i can identify data nodes or namenodes
>> from another data center correct me if i am wrong
>>
>>
>> On Sat, Aug 31, 2013 at 7:00 PM, Visioner Sadak <visioner.sadak@gmail.com
>> > wrote:
>>
>> lets say that
>>
>> you have some machines in europe and some  in US I think you just need
>> the ips and configure them in your cluster set up
>> it will work...
>>
>>
>> On Sat, Aug 31, 2013 at 7:52 AM, Jun Ping Du <jd...@vmware.com> wrote:
>>
>> Hi,
>>     Although you can set datacenter layer on your network topology, it is
>> never enabled in hadoop as lacking of replica placement and task scheduling
>> support. There are some work to add layers other than rack and node under
>> HADOOP-8848 but may not suit for your case. Agree with Adam that a cluster
>> spanning multiple data centers seems not make sense even for DR case. Do
>> you have other cases to do such a deployment?
>>
>> Thanks,
>>
>> Junping
>>
>> ------------------------------
>> *From: *"Adam Muise" <am...@hortonworks.com>
>> *To: *user@hadoop.apache.org
>> *Sent: *Friday, August 30, 2013 6:26:54 PM
>> *Subject: *Re: Multidata center support
>>
>>
>> Nothing has changed. DR best practice is still one (or more) clusters per
>> site and replication is handled via distributed copy or some variation of
>> it. A cluster spanning multiple data centers is a poor idea right now.
>>
>>
>>
>>
>> On Fri, Aug 30, 2013 at 12:35 AM, Rahul Bhattacharjee <
>> rahul.rec.dgp@gmail.com> wrote:
>>
>> My take on this.
>>
>> Why hadoop has to know about data center thing. I think it can be
>> installed across multiple data centers , however topology configuration
>> would be required to tell which node belongs to which data center and
>> switch for block placement.
>>
>> Thanks,
>> Rahul
>>
>>
>> On Fri, Aug 30, 2013 at 12:42 AM, Baskar Duraikannu <
>> baskar.duraikannu@outlook.com> wrote:
>>
>> We have a need to setup hadoop across data centers.  Does hadoop support
>> multi data center configuration? I searched through archives and have found
>> that hadoop did not support multi data center configuration some time back.
>> Just wanted to see whether situation has changed.
>>
>> Please help.
>>
>>
>>
>>
>>
>> --
>> *
>> *
>> *
>> *
>> *Adam Muise*
>> Solution Engineer
>> *Hortonworks*
>> amuise@hortonworks.com
>> 416-417-4037
>>
>> Hortonworks - Develops, Distributes and Supports Enterprise Apache Hadoop.<http://hortonworks.com/>
>>
>> Hortonworks Virtual Sandbox <http://hortonworks.com/sandbox>
>>
>> Hadoop: Disruptive Possibilities by Jeff Needham<http://hortonworks.com/resources/?did=72&cat=1>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>>
>>
>>
>>
>>
>

Re: Multidata center support

Posted by Rahul Bhattacharjee <ra...@gmail.com>.
Under replicated blocks are also consistent from a consumers point. Care of
explain the relation to weak consistency to hadoop.

Thanks,
Rahul


On Wed, Sep 4, 2013 at 9:56 AM, Rahul Bhattacharjee <rahul.rec.dgp@gmail.com
> wrote:

> Adam's response makes more sense to me to offline replicate generated data
> from one cluster to another across data centers.
>
> Not sure if configurable block placement block placement policy is
> supported in Hadoop.If yes , then alone side with rack awareness , you
> should be able to achieve the same.
>
> I could not follow your question related to weak consistency.
>
> Thanks,
> Rahul
>
>
>
> On Wed, Sep 4, 2013 at 2:20 AM, Baskar Duraikannu <
> baskar.duraikannu@outlook.com> wrote:
>
>> Rahul
>> Are you talking about rack-awareness script?
>>
>> I did go through rack awareness. Here are the problems with rack
>> awareness w.r.to my (given) "business requirment"
>>
>> 1.  Hadoop , default places two copies on the same rack and 1 copy on
>> some other rack.  This would work as long as we have two data centers. if
>> business wants to have three data centers, then data would not be spread
>> across. Separately there is a question around whether it is the right thing
>> to do or not. I have been promised by business that they would buy enough
>> bandwidth such that each data center will be few milliseconds apart (in
>> latency).
>>
>> 2. I believe Hadoop automatically re-replicates data if one or more node
>> is down. Assume when one out of 2 data center goes down. There will be a
>> massive data flow to create additional copies.  When I say data center
>> support, I should be able to configure hadoop to say
>>      a) Maintain 1 copy per data center
>>      b) If any data center goes down, dont create additional copies.
>>
>> Above requirements that I am pointing will essentially move hadoop from
>> strongly consistent to a week/eventual consistent model. Since this changes
>> fundamental architecture, it will probably break all sort of things...
>> Might not be possible ever in Hadoop.
>>
>> Thoughts?
>>
>> Sadak
>> Is there a way to implement above requirement via Federation?
>>
>> Thanks
>> Baskar
>>
>>
>> ------------------------------
>> Date: Sun, 1 Sep 2013 00:20:04 +0530
>>
>> Subject: Re: Multidata center support
>> From: visioner.sadak@gmail.com
>> To: user@hadoop.apache.org
>>
>>
>> What do you think friends I think hadoop clusters can run on multiple
>> data centers using FEDERATION
>>
>>
>> On Sat, Aug 31, 2013 at 8:39 PM, Visioner Sadak <visioner.sadak@gmail.com
>> > wrote:
>>
>> The only problem i guess hadoop wont be able to duplicate data from one
>> data center to another but i guess i can identify data nodes or namenodes
>> from another data center correct me if i am wrong
>>
>>
>> On Sat, Aug 31, 2013 at 7:00 PM, Visioner Sadak <visioner.sadak@gmail.com
>> > wrote:
>>
>> lets say that
>>
>> you have some machines in europe and some  in US I think you just need
>> the ips and configure them in your cluster set up
>> it will work...
>>
>>
>> On Sat, Aug 31, 2013 at 7:52 AM, Jun Ping Du <jd...@vmware.com> wrote:
>>
>> Hi,
>>     Although you can set datacenter layer on your network topology, it is
>> never enabled in hadoop as lacking of replica placement and task scheduling
>> support. There are some work to add layers other than rack and node under
>> HADOOP-8848 but may not suit for your case. Agree with Adam that a cluster
>> spanning multiple data centers seems not make sense even for DR case. Do
>> you have other cases to do such a deployment?
>>
>> Thanks,
>>
>> Junping
>>
>> ------------------------------
>> *From: *"Adam Muise" <am...@hortonworks.com>
>> *To: *user@hadoop.apache.org
>> *Sent: *Friday, August 30, 2013 6:26:54 PM
>> *Subject: *Re: Multidata center support
>>
>>
>> Nothing has changed. DR best practice is still one (or more) clusters per
>> site and replication is handled via distributed copy or some variation of
>> it. A cluster spanning multiple data centers is a poor idea right now.
>>
>>
>>
>>
>> On Fri, Aug 30, 2013 at 12:35 AM, Rahul Bhattacharjee <
>> rahul.rec.dgp@gmail.com> wrote:
>>
>> My take on this.
>>
>> Why hadoop has to know about data center thing. I think it can be
>> installed across multiple data centers , however topology configuration
>> would be required to tell which node belongs to which data center and
>> switch for block placement.
>>
>> Thanks,
>> Rahul
>>
>>
>> On Fri, Aug 30, 2013 at 12:42 AM, Baskar Duraikannu <
>> baskar.duraikannu@outlook.com> wrote:
>>
>> We have a need to setup hadoop across data centers.  Does hadoop support
>> multi data center configuration? I searched through archives and have found
>> that hadoop did not support multi data center configuration some time back.
>> Just wanted to see whether situation has changed.
>>
>> Please help.
>>
>>
>>
>>
>>
>> --
>> *
>> *
>> *
>> *
>> *Adam Muise*
>> Solution Engineer
>> *Hortonworks*
>> amuise@hortonworks.com
>> 416-417-4037
>>
>> Hortonworks - Develops, Distributes and Supports Enterprise Apache Hadoop.<http://hortonworks.com/>
>>
>> Hortonworks Virtual Sandbox <http://hortonworks.com/sandbox>
>>
>> Hadoop: Disruptive Possibilities by Jeff Needham<http://hortonworks.com/resources/?did=72&cat=1>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>>
>>
>>
>>
>>
>

Re: Multidata center support

Posted by Rahul Bhattacharjee <ra...@gmail.com>.
Under replicated blocks are also consistent from a consumers point. Care of
explain the relation to weak consistency to hadoop.

Thanks,
Rahul


On Wed, Sep 4, 2013 at 9:56 AM, Rahul Bhattacharjee <rahul.rec.dgp@gmail.com
> wrote:

> Adam's response makes more sense to me to offline replicate generated data
> from one cluster to another across data centers.
>
> Not sure if configurable block placement block placement policy is
> supported in Hadoop.If yes , then alone side with rack awareness , you
> should be able to achieve the same.
>
> I could not follow your question related to weak consistency.
>
> Thanks,
> Rahul
>
>
>
> On Wed, Sep 4, 2013 at 2:20 AM, Baskar Duraikannu <
> baskar.duraikannu@outlook.com> wrote:
>
>> Rahul
>> Are you talking about rack-awareness script?
>>
>> I did go through rack awareness. Here are the problems with rack
>> awareness w.r.to my (given) "business requirment"
>>
>> 1.  Hadoop , default places two copies on the same rack and 1 copy on
>> some other rack.  This would work as long as we have two data centers. if
>> business wants to have three data centers, then data would not be spread
>> across. Separately there is a question around whether it is the right thing
>> to do or not. I have been promised by business that they would buy enough
>> bandwidth such that each data center will be few milliseconds apart (in
>> latency).
>>
>> 2. I believe Hadoop automatically re-replicates data if one or more node
>> is down. Assume when one out of 2 data center goes down. There will be a
>> massive data flow to create additional copies.  When I say data center
>> support, I should be able to configure hadoop to say
>>      a) Maintain 1 copy per data center
>>      b) If any data center goes down, dont create additional copies.
>>
>> Above requirements that I am pointing will essentially move hadoop from
>> strongly consistent to a week/eventual consistent model. Since this changes
>> fundamental architecture, it will probably break all sort of things...
>> Might not be possible ever in Hadoop.
>>
>> Thoughts?
>>
>> Sadak
>> Is there a way to implement above requirement via Federation?
>>
>> Thanks
>> Baskar
>>
>>
>> ------------------------------
>> Date: Sun, 1 Sep 2013 00:20:04 +0530
>>
>> Subject: Re: Multidata center support
>> From: visioner.sadak@gmail.com
>> To: user@hadoop.apache.org
>>
>>
>> What do you think friends I think hadoop clusters can run on multiple
>> data centers using FEDERATION
>>
>>
>> On Sat, Aug 31, 2013 at 8:39 PM, Visioner Sadak <visioner.sadak@gmail.com
>> > wrote:
>>
>> The only problem i guess hadoop wont be able to duplicate data from one
>> data center to another but i guess i can identify data nodes or namenodes
>> from another data center correct me if i am wrong
>>
>>
>> On Sat, Aug 31, 2013 at 7:00 PM, Visioner Sadak <visioner.sadak@gmail.com
>> > wrote:
>>
>> lets say that
>>
>> you have some machines in europe and some  in US I think you just need
>> the ips and configure them in your cluster set up
>> it will work...
>>
>>
>> On Sat, Aug 31, 2013 at 7:52 AM, Jun Ping Du <jd...@vmware.com> wrote:
>>
>> Hi,
>>     Although you can set datacenter layer on your network topology, it is
>> never enabled in hadoop as lacking of replica placement and task scheduling
>> support. There are some work to add layers other than rack and node under
>> HADOOP-8848 but may not suit for your case. Agree with Adam that a cluster
>> spanning multiple data centers seems not make sense even for DR case. Do
>> you have other cases to do such a deployment?
>>
>> Thanks,
>>
>> Junping
>>
>> ------------------------------
>> *From: *"Adam Muise" <am...@hortonworks.com>
>> *To: *user@hadoop.apache.org
>> *Sent: *Friday, August 30, 2013 6:26:54 PM
>> *Subject: *Re: Multidata center support
>>
>>
>> Nothing has changed. DR best practice is still one (or more) clusters per
>> site and replication is handled via distributed copy or some variation of
>> it. A cluster spanning multiple data centers is a poor idea right now.
>>
>>
>>
>>
>> On Fri, Aug 30, 2013 at 12:35 AM, Rahul Bhattacharjee <
>> rahul.rec.dgp@gmail.com> wrote:
>>
>> My take on this.
>>
>> Why hadoop has to know about data center thing. I think it can be
>> installed across multiple data centers , however topology configuration
>> would be required to tell which node belongs to which data center and
>> switch for block placement.
>>
>> Thanks,
>> Rahul
>>
>>
>> On Fri, Aug 30, 2013 at 12:42 AM, Baskar Duraikannu <
>> baskar.duraikannu@outlook.com> wrote:
>>
>> We have a need to setup hadoop across data centers.  Does hadoop support
>> multi data center configuration? I searched through archives and have found
>> that hadoop did not support multi data center configuration some time back.
>> Just wanted to see whether situation has changed.
>>
>> Please help.
>>
>>
>>
>>
>>
>> --
>> *
>> *
>> *
>> *
>> *Adam Muise*
>> Solution Engineer
>> *Hortonworks*
>> amuise@hortonworks.com
>> 416-417-4037
>>
>> Hortonworks - Develops, Distributes and Supports Enterprise Apache Hadoop.<http://hortonworks.com/>
>>
>> Hortonworks Virtual Sandbox <http://hortonworks.com/sandbox>
>>
>> Hadoop: Disruptive Possibilities by Jeff Needham<http://hortonworks.com/resources/?did=72&cat=1>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>>
>>
>>
>>
>>
>

Re: Multidata center support

Posted by Rahul Bhattacharjee <ra...@gmail.com>.
Adam's response makes more sense to me to offline replicate generated data
from one cluster to another across data centers.

Not sure if configurable block placement block placement policy is
supported in Hadoop.If yes , then alone side with rack awareness , you
should be able to achieve the same.

I could not follow your question related to weak consistency.

Thanks,
Rahul



On Wed, Sep 4, 2013 at 2:20 AM, Baskar Duraikannu <
baskar.duraikannu@outlook.com> wrote:

> Rahul
> Are you talking about rack-awareness script?
>
> I did go through rack awareness. Here are the problems with rack awareness
> w.r.to my (given) "business requirment"
>
> 1.  Hadoop , default places two copies on the same rack and 1 copy on some
> other rack.  This would work as long as we have two data centers. if
> business wants to have three data centers, then data would not be spread
> across. Separately there is a question around whether it is the right thing
> to do or not. I have been promised by business that they would buy enough
> bandwidth such that each data center will be few milliseconds apart (in
> latency).
>
> 2. I believe Hadoop automatically re-replicates data if one or more node
> is down. Assume when one out of 2 data center goes down. There will be a
> massive data flow to create additional copies.  When I say data center
> support, I should be able to configure hadoop to say
>      a) Maintain 1 copy per data center
>      b) If any data center goes down, dont create additional copies.
>
> Above requirements that I am pointing will essentially move hadoop from
> strongly consistent to a week/eventual consistent model. Since this changes
> fundamental architecture, it will probably break all sort of things...
> Might not be possible ever in Hadoop.
>
> Thoughts?
>
> Sadak
> Is there a way to implement above requirement via Federation?
>
> Thanks
> Baskar
>
>
> ------------------------------
> Date: Sun, 1 Sep 2013 00:20:04 +0530
>
> Subject: Re: Multidata center support
> From: visioner.sadak@gmail.com
> To: user@hadoop.apache.org
>
>
> What do you think friends I think hadoop clusters can run on multiple data
> centers using FEDERATION
>
>
> On Sat, Aug 31, 2013 at 8:39 PM, Visioner Sadak <vi...@gmail.com>wrote:
>
> The only problem i guess hadoop wont be able to duplicate data from one
> data center to another but i guess i can identify data nodes or namenodes
> from another data center correct me if i am wrong
>
>
> On Sat, Aug 31, 2013 at 7:00 PM, Visioner Sadak <vi...@gmail.com>wrote:
>
> lets say that
>
> you have some machines in europe and some  in US I think you just need the
> ips and configure them in your cluster set up
> it will work...
>
>
> On Sat, Aug 31, 2013 at 7:52 AM, Jun Ping Du <jd...@vmware.com> wrote:
>
> Hi,
>     Although you can set datacenter layer on your network topology, it is
> never enabled in hadoop as lacking of replica placement and task scheduling
> support. There are some work to add layers other than rack and node under
> HADOOP-8848 but may not suit for your case. Agree with Adam that a cluster
> spanning multiple data centers seems not make sense even for DR case. Do
> you have other cases to do such a deployment?
>
> Thanks,
>
> Junping
>
> ------------------------------
> *From: *"Adam Muise" <am...@hortonworks.com>
> *To: *user@hadoop.apache.org
> *Sent: *Friday, August 30, 2013 6:26:54 PM
> *Subject: *Re: Multidata center support
>
>
> Nothing has changed. DR best practice is still one (or more) clusters per
> site and replication is handled via distributed copy or some variation of
> it. A cluster spanning multiple data centers is a poor idea right now.
>
>
>
>
> On Fri, Aug 30, 2013 at 12:35 AM, Rahul Bhattacharjee <
> rahul.rec.dgp@gmail.com> wrote:
>
> My take on this.
>
> Why hadoop has to know about data center thing. I think it can be
> installed across multiple data centers , however topology configuration
> would be required to tell which node belongs to which data center and
> switch for block placement.
>
> Thanks,
> Rahul
>
>
> On Fri, Aug 30, 2013 at 12:42 AM, Baskar Duraikannu <
> baskar.duraikannu@outlook.com> wrote:
>
> We have a need to setup hadoop across data centers.  Does hadoop support
> multi data center configuration? I searched through archives and have found
> that hadoop did not support multi data center configuration some time back.
> Just wanted to see whether situation has changed.
>
> Please help.
>
>
>
>
>
> --
> *
> *
> *
> *
> *Adam Muise*
> Solution Engineer
> *Hortonworks*
> amuise@hortonworks.com
> 416-417-4037
>
> Hortonworks - Develops, Distributes and Supports Enterprise Apache Hadoop.<http://hortonworks.com/>
>
> Hortonworks Virtual Sandbox <http://hortonworks.com/sandbox>
>
> Hadoop: Disruptive Possibilities by Jeff Needham<http://hortonworks.com/resources/?did=72&cat=1>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>
>
>
>
>

Re: Multidata center support

Posted by Rahul Bhattacharjee <ra...@gmail.com>.
Adam's response makes more sense to me to offline replicate generated data
from one cluster to another across data centers.

Not sure if configurable block placement block placement policy is
supported in Hadoop.If yes , then alone side with rack awareness , you
should be able to achieve the same.

I could not follow your question related to weak consistency.

Thanks,
Rahul



On Wed, Sep 4, 2013 at 2:20 AM, Baskar Duraikannu <
baskar.duraikannu@outlook.com> wrote:

> Rahul
> Are you talking about rack-awareness script?
>
> I did go through rack awareness. Here are the problems with rack awareness
> w.r.to my (given) "business requirment"
>
> 1.  Hadoop , default places two copies on the same rack and 1 copy on some
> other rack.  This would work as long as we have two data centers. if
> business wants to have three data centers, then data would not be spread
> across. Separately there is a question around whether it is the right thing
> to do or not. I have been promised by business that they would buy enough
> bandwidth such that each data center will be few milliseconds apart (in
> latency).
>
> 2. I believe Hadoop automatically re-replicates data if one or more node
> is down. Assume when one out of 2 data center goes down. There will be a
> massive data flow to create additional copies.  When I say data center
> support, I should be able to configure hadoop to say
>      a) Maintain 1 copy per data center
>      b) If any data center goes down, dont create additional copies.
>
> Above requirements that I am pointing will essentially move hadoop from
> strongly consistent to a week/eventual consistent model. Since this changes
> fundamental architecture, it will probably break all sort of things...
> Might not be possible ever in Hadoop.
>
> Thoughts?
>
> Sadak
> Is there a way to implement above requirement via Federation?
>
> Thanks
> Baskar
>
>
> ------------------------------
> Date: Sun, 1 Sep 2013 00:20:04 +0530
>
> Subject: Re: Multidata center support
> From: visioner.sadak@gmail.com
> To: user@hadoop.apache.org
>
>
> What do you think friends I think hadoop clusters can run on multiple data
> centers using FEDERATION
>
>
> On Sat, Aug 31, 2013 at 8:39 PM, Visioner Sadak <vi...@gmail.com>wrote:
>
> The only problem i guess hadoop wont be able to duplicate data from one
> data center to another but i guess i can identify data nodes or namenodes
> from another data center correct me if i am wrong
>
>
> On Sat, Aug 31, 2013 at 7:00 PM, Visioner Sadak <vi...@gmail.com>wrote:
>
> lets say that
>
> you have some machines in europe and some  in US I think you just need the
> ips and configure them in your cluster set up
> it will work...
>
>
> On Sat, Aug 31, 2013 at 7:52 AM, Jun Ping Du <jd...@vmware.com> wrote:
>
> Hi,
>     Although you can set datacenter layer on your network topology, it is
> never enabled in hadoop as lacking of replica placement and task scheduling
> support. There are some work to add layers other than rack and node under
> HADOOP-8848 but may not suit for your case. Agree with Adam that a cluster
> spanning multiple data centers seems not make sense even for DR case. Do
> you have other cases to do such a deployment?
>
> Thanks,
>
> Junping
>
> ------------------------------
> *From: *"Adam Muise" <am...@hortonworks.com>
> *To: *user@hadoop.apache.org
> *Sent: *Friday, August 30, 2013 6:26:54 PM
> *Subject: *Re: Multidata center support
>
>
> Nothing has changed. DR best practice is still one (or more) clusters per
> site and replication is handled via distributed copy or some variation of
> it. A cluster spanning multiple data centers is a poor idea right now.
>
>
>
>
> On Fri, Aug 30, 2013 at 12:35 AM, Rahul Bhattacharjee <
> rahul.rec.dgp@gmail.com> wrote:
>
> My take on this.
>
> Why hadoop has to know about data center thing. I think it can be
> installed across multiple data centers , however topology configuration
> would be required to tell which node belongs to which data center and
> switch for block placement.
>
> Thanks,
> Rahul
>
>
> On Fri, Aug 30, 2013 at 12:42 AM, Baskar Duraikannu <
> baskar.duraikannu@outlook.com> wrote:
>
> We have a need to setup hadoop across data centers.  Does hadoop support
> multi data center configuration? I searched through archives and have found
> that hadoop did not support multi data center configuration some time back.
> Just wanted to see whether situation has changed.
>
> Please help.
>
>
>
>
>
> --
> *
> *
> *
> *
> *Adam Muise*
> Solution Engineer
> *Hortonworks*
> amuise@hortonworks.com
> 416-417-4037
>
> Hortonworks - Develops, Distributes and Supports Enterprise Apache Hadoop.<http://hortonworks.com/>
>
> Hortonworks Virtual Sandbox <http://hortonworks.com/sandbox>
>
> Hadoop: Disruptive Possibilities by Jeff Needham<http://hortonworks.com/resources/?did=72&cat=1>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>
>
>
>
>

Re: Multidata center support

Posted by Visioner Sadak <vi...@gmail.com>.
Hi friends

hello baskar i think rack awareness and data center awareness are different
and similarly nodes and data centers are different things from hadoops
perspective but ideally it shud be same i mean nodes can be in different
data centers right but i think hadoop doesnt not  replicate data across
data centers i am not sure abt this (can anyone please comment on this).....

federation can provide different namenodes so you can create independent
clusters for example one cluster at one data center
and another cluster at a different  data center..... but if hadoop can
replicate across data centers then we need only one federation cluster for
all data centers :)...are any of you guys using a single federation cluster
across multiple data centers in production  for example

CASE--1

one cluster federation/data centers at  US/Europe---------------(if hadoop
can replicate across data centers )

NN1 ------US       DN1 ----US
NN2 ------Europe DN2 -----Europe

In this case data can be replicated to DN1 and DN2

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
CASE--2

two independent cluster federation/data centers at
 US/Europe--------------(if hadoop cannot replicate across data centers )

cluster 1
cluster 2

NN1 ------US       DN1 ----US                                  NN2
------Europe DN2 -----Europe


In this case data cannot be replicated to DN2 or vice versa


*Can anyone clarify which will be the right and optimal case for hadoop
---------------------------------------------------------:)*





On Wed, Sep 4, 2013 at 2:20 AM, Baskar Duraikannu <
baskar.duraikannu@outlook.com> wrote:

> Rahul
> Are you talking about rack-awareness script?
>
> I did go through rack awareness. Here are the problems with rack awareness
> w.r.to my (given) "business requirment"
>
> 1.  Hadoop , default places two copies on the same rack and 1 copy on some
> other rack.  This would work as long as we have two data centers. if
> business wants to have three data centers, then data would not be spread
> across. Separately there is a question around whether it is the right thing
> to do or not. I have been promised by business that they would buy enough
> bandwidth such that each data center will be few milliseconds apart (in
> latency).
>
> 2. I believe Hadoop automatically re-replicates data if one or more node
> is down. Assume when one out of 2 data center goes down. There will be a
> massive data flow to create additional copies.  When I say data center
> support, I should be able to configure hadoop to say
>      a) Maintain 1 copy per data center
>      b) If any data center goes down, dont create additional copies.
>
> Above requirements that I am pointing will essentially move hadoop from
> strongly consistent to a week/eventual consistent model. Since this changes
> fundamental architecture, it will probably break all sort of things...
> Might not be possible ever in Hadoop.
>
> Thoughts?
>
> Sadak
> Is there a way to implement above requirement via Federation?
>
> Thanks
> Baskar
>
>
> ------------------------------
> Date: Sun, 1 Sep 2013 00:20:04 +0530
>
> Subject: Re: Multidata center support
> From: visioner.sadak@gmail.com
> To: user@hadoop.apache.org
>
>
> What do you think friends I think hadoop clusters can run on multiple data
> centers using FEDERATION
>
>
> On Sat, Aug 31, 2013 at 8:39 PM, Visioner Sadak <vi...@gmail.com>wrote:
>
> The only problem i guess hadoop wont be able to duplicate data from one
> data center to another but i guess i can identify data nodes or namenodes
> from another data center correct me if i am wrong
>
>
> On Sat, Aug 31, 2013 at 7:00 PM, Visioner Sadak <vi...@gmail.com>wrote:
>
> lets say that
>
> you have some machines in europe and some  in US I think you just need the
> ips and configure them in your cluster set up
> it will work...
>
>
> On Sat, Aug 31, 2013 at 7:52 AM, Jun Ping Du <jd...@vmware.com> wrote:
>
> Hi,
>     Although you can set datacenter layer on your network topology, it is
> never enabled in hadoop as lacking of replica placement and task scheduling
> support. There are some work to add layers other than rack and node under
> HADOOP-8848 but may not suit for your case. Agree with Adam that a cluster
> spanning multiple data centers seems not make sense even for DR case. Do
> you have other cases to do such a deployment?
>
> Thanks,
>
> Junping
>
> ------------------------------
> *From: *"Adam Muise" <am...@hortonworks.com>
> *To: *user@hadoop.apache.org
> *Sent: *Friday, August 30, 2013 6:26:54 PM
> *Subject: *Re: Multidata center support
>
>
> Nothing has changed. DR best practice is still one (or more) clusters per
> site and replication is handled via distributed copy or some variation of
> it. A cluster spanning multiple data centers is a poor idea right now.
>
>
>
>
> On Fri, Aug 30, 2013 at 12:35 AM, Rahul Bhattacharjee <
> rahul.rec.dgp@gmail.com> wrote:
>
> My take on this.
>
> Why hadoop has to know about data center thing. I think it can be
> installed across multiple data centers , however topology configuration
> would be required to tell which node belongs to which data center and
> switch for block placement.
>
> Thanks,
> Rahul
>
>
> On Fri, Aug 30, 2013 at 12:42 AM, Baskar Duraikannu <
> baskar.duraikannu@outlook.com> wrote:
>
> We have a need to setup hadoop across data centers.  Does hadoop support
> multi data center configuration? I searched through archives and have found
> that hadoop did not support multi data center configuration some time back.
> Just wanted to see whether situation has changed.
>
> Please help.
>
>
>
>
>
> --
> *
> *
> *
> *
> *Adam Muise*
> Solution Engineer
> *Hortonworks*
> amuise@hortonworks.com
> 416-417-4037
>
> Hortonworks - Develops, Distributes and Supports Enterprise Apache Hadoop.<http://hortonworks.com/>
>
> Hortonworks Virtual Sandbox <http://hortonworks.com/sandbox>
>
> Hadoop: Disruptive Possibilities by Jeff Needham<http://hortonworks.com/resources/?did=72&cat=1>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>
>
>
>
>

Re: Multidata center support

Posted by Rahul Bhattacharjee <ra...@gmail.com>.
Adam's response makes more sense to me to offline replicate generated data
from one cluster to another across data centers.

Not sure if configurable block placement block placement policy is
supported in Hadoop.If yes , then alone side with rack awareness , you
should be able to achieve the same.

I could not follow your question related to weak consistency.

Thanks,
Rahul



On Wed, Sep 4, 2013 at 2:20 AM, Baskar Duraikannu <
baskar.duraikannu@outlook.com> wrote:

> Rahul
> Are you talking about rack-awareness script?
>
> I did go through rack awareness. Here are the problems with rack awareness
> w.r.to my (given) "business requirment"
>
> 1.  Hadoop , default places two copies on the same rack and 1 copy on some
> other rack.  This would work as long as we have two data centers. if
> business wants to have three data centers, then data would not be spread
> across. Separately there is a question around whether it is the right thing
> to do or not. I have been promised by business that they would buy enough
> bandwidth such that each data center will be few milliseconds apart (in
> latency).
>
> 2. I believe Hadoop automatically re-replicates data if one or more node
> is down. Assume when one out of 2 data center goes down. There will be a
> massive data flow to create additional copies.  When I say data center
> support, I should be able to configure hadoop to say
>      a) Maintain 1 copy per data center
>      b) If any data center goes down, dont create additional copies.
>
> Above requirements that I am pointing will essentially move hadoop from
> strongly consistent to a week/eventual consistent model. Since this changes
> fundamental architecture, it will probably break all sort of things...
> Might not be possible ever in Hadoop.
>
> Thoughts?
>
> Sadak
> Is there a way to implement above requirement via Federation?
>
> Thanks
> Baskar
>
>
> ------------------------------
> Date: Sun, 1 Sep 2013 00:20:04 +0530
>
> Subject: Re: Multidata center support
> From: visioner.sadak@gmail.com
> To: user@hadoop.apache.org
>
>
> What do you think friends I think hadoop clusters can run on multiple data
> centers using FEDERATION
>
>
> On Sat, Aug 31, 2013 at 8:39 PM, Visioner Sadak <vi...@gmail.com>wrote:
>
> The only problem i guess hadoop wont be able to duplicate data from one
> data center to another but i guess i can identify data nodes or namenodes
> from another data center correct me if i am wrong
>
>
> On Sat, Aug 31, 2013 at 7:00 PM, Visioner Sadak <vi...@gmail.com>wrote:
>
> lets say that
>
> you have some machines in europe and some  in US I think you just need the
> ips and configure them in your cluster set up
> it will work...
>
>
> On Sat, Aug 31, 2013 at 7:52 AM, Jun Ping Du <jd...@vmware.com> wrote:
>
> Hi,
>     Although you can set datacenter layer on your network topology, it is
> never enabled in hadoop as lacking of replica placement and task scheduling
> support. There are some work to add layers other than rack and node under
> HADOOP-8848 but may not suit for your case. Agree with Adam that a cluster
> spanning multiple data centers seems not make sense even for DR case. Do
> you have other cases to do such a deployment?
>
> Thanks,
>
> Junping
>
> ------------------------------
> *From: *"Adam Muise" <am...@hortonworks.com>
> *To: *user@hadoop.apache.org
> *Sent: *Friday, August 30, 2013 6:26:54 PM
> *Subject: *Re: Multidata center support
>
>
> Nothing has changed. DR best practice is still one (or more) clusters per
> site and replication is handled via distributed copy or some variation of
> it. A cluster spanning multiple data centers is a poor idea right now.
>
>
>
>
> On Fri, Aug 30, 2013 at 12:35 AM, Rahul Bhattacharjee <
> rahul.rec.dgp@gmail.com> wrote:
>
> My take on this.
>
> Why hadoop has to know about data center thing. I think it can be
> installed across multiple data centers , however topology configuration
> would be required to tell which node belongs to which data center and
> switch for block placement.
>
> Thanks,
> Rahul
>
>
> On Fri, Aug 30, 2013 at 12:42 AM, Baskar Duraikannu <
> baskar.duraikannu@outlook.com> wrote:
>
> We have a need to setup hadoop across data centers.  Does hadoop support
> multi data center configuration? I searched through archives and have found
> that hadoop did not support multi data center configuration some time back.
> Just wanted to see whether situation has changed.
>
> Please help.
>
>
>
>
>
> --
> *
> *
> *
> *
> *Adam Muise*
> Solution Engineer
> *Hortonworks*
> amuise@hortonworks.com
> 416-417-4037
>
> Hortonworks - Develops, Distributes and Supports Enterprise Apache Hadoop.<http://hortonworks.com/>
>
> Hortonworks Virtual Sandbox <http://hortonworks.com/sandbox>
>
> Hadoop: Disruptive Possibilities by Jeff Needham<http://hortonworks.com/resources/?did=72&cat=1>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>
>
>
>
>

Re: Multidata center support

Posted by Rahul Bhattacharjee <ra...@gmail.com>.
Adam's response makes more sense to me to offline replicate generated data
from one cluster to another across data centers.

Not sure if configurable block placement block placement policy is
supported in Hadoop.If yes , then alone side with rack awareness , you
should be able to achieve the same.

I could not follow your question related to weak consistency.

Thanks,
Rahul



On Wed, Sep 4, 2013 at 2:20 AM, Baskar Duraikannu <
baskar.duraikannu@outlook.com> wrote:

> Rahul
> Are you talking about rack-awareness script?
>
> I did go through rack awareness. Here are the problems with rack awareness
> w.r.to my (given) "business requirment"
>
> 1.  Hadoop , default places two copies on the same rack and 1 copy on some
> other rack.  This would work as long as we have two data centers. if
> business wants to have three data centers, then data would not be spread
> across. Separately there is a question around whether it is the right thing
> to do or not. I have been promised by business that they would buy enough
> bandwidth such that each data center will be few milliseconds apart (in
> latency).
>
> 2. I believe Hadoop automatically re-replicates data if one or more node
> is down. Assume when one out of 2 data center goes down. There will be a
> massive data flow to create additional copies.  When I say data center
> support, I should be able to configure hadoop to say
>      a) Maintain 1 copy per data center
>      b) If any data center goes down, dont create additional copies.
>
> Above requirements that I am pointing will essentially move hadoop from
> strongly consistent to a week/eventual consistent model. Since this changes
> fundamental architecture, it will probably break all sort of things...
> Might not be possible ever in Hadoop.
>
> Thoughts?
>
> Sadak
> Is there a way to implement above requirement via Federation?
>
> Thanks
> Baskar
>
>
> ------------------------------
> Date: Sun, 1 Sep 2013 00:20:04 +0530
>
> Subject: Re: Multidata center support
> From: visioner.sadak@gmail.com
> To: user@hadoop.apache.org
>
>
> What do you think friends I think hadoop clusters can run on multiple data
> centers using FEDERATION
>
>
> On Sat, Aug 31, 2013 at 8:39 PM, Visioner Sadak <vi...@gmail.com>wrote:
>
> The only problem i guess hadoop wont be able to duplicate data from one
> data center to another but i guess i can identify data nodes or namenodes
> from another data center correct me if i am wrong
>
>
> On Sat, Aug 31, 2013 at 7:00 PM, Visioner Sadak <vi...@gmail.com>wrote:
>
> lets say that
>
> you have some machines in europe and some  in US I think you just need the
> ips and configure them in your cluster set up
> it will work...
>
>
> On Sat, Aug 31, 2013 at 7:52 AM, Jun Ping Du <jd...@vmware.com> wrote:
>
> Hi,
>     Although you can set datacenter layer on your network topology, it is
> never enabled in hadoop as lacking of replica placement and task scheduling
> support. There are some work to add layers other than rack and node under
> HADOOP-8848 but may not suit for your case. Agree with Adam that a cluster
> spanning multiple data centers seems not make sense even for DR case. Do
> you have other cases to do such a deployment?
>
> Thanks,
>
> Junping
>
> ------------------------------
> *From: *"Adam Muise" <am...@hortonworks.com>
> *To: *user@hadoop.apache.org
> *Sent: *Friday, August 30, 2013 6:26:54 PM
> *Subject: *Re: Multidata center support
>
>
> Nothing has changed. DR best practice is still one (or more) clusters per
> site and replication is handled via distributed copy or some variation of
> it. A cluster spanning multiple data centers is a poor idea right now.
>
>
>
>
> On Fri, Aug 30, 2013 at 12:35 AM, Rahul Bhattacharjee <
> rahul.rec.dgp@gmail.com> wrote:
>
> My take on this.
>
> Why hadoop has to know about data center thing. I think it can be
> installed across multiple data centers , however topology configuration
> would be required to tell which node belongs to which data center and
> switch for block placement.
>
> Thanks,
> Rahul
>
>
> On Fri, Aug 30, 2013 at 12:42 AM, Baskar Duraikannu <
> baskar.duraikannu@outlook.com> wrote:
>
> We have a need to setup hadoop across data centers.  Does hadoop support
> multi data center configuration? I searched through archives and have found
> that hadoop did not support multi data center configuration some time back.
> Just wanted to see whether situation has changed.
>
> Please help.
>
>
>
>
>
> --
> *
> *
> *
> *
> *Adam Muise*
> Solution Engineer
> *Hortonworks*
> amuise@hortonworks.com
> 416-417-4037
>
> Hortonworks - Develops, Distributes and Supports Enterprise Apache Hadoop.<http://hortonworks.com/>
>
> Hortonworks Virtual Sandbox <http://hortonworks.com/sandbox>
>
> Hadoop: Disruptive Possibilities by Jeff Needham<http://hortonworks.com/resources/?did=72&cat=1>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>
>
>
>
>

Re: Multidata center support

Posted by Visioner Sadak <vi...@gmail.com>.
Hi friends

hello baskar i think rack awareness and data center awareness are different
and similarly nodes and data centers are different things from hadoops
perspective but ideally it shud be same i mean nodes can be in different
data centers right but i think hadoop doesnt not  replicate data across
data centers i am not sure abt this (can anyone please comment on this).....

federation can provide different namenodes so you can create independent
clusters for example one cluster at one data center
and another cluster at a different  data center..... but if hadoop can
replicate across data centers then we need only one federation cluster for
all data centers :)...are any of you guys using a single federation cluster
across multiple data centers in production  for example

CASE--1

one cluster federation/data centers at  US/Europe---------------(if hadoop
can replicate across data centers )

NN1 ------US       DN1 ----US
NN2 ------Europe DN2 -----Europe

In this case data can be replicated to DN1 and DN2

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
CASE--2

two independent cluster federation/data centers at
 US/Europe--------------(if hadoop cannot replicate across data centers )

cluster 1
cluster 2

NN1 ------US       DN1 ----US                                  NN2
------Europe DN2 -----Europe


In this case data cannot be replicated to DN2 or vice versa


*Can anyone clarify which will be the right and optimal case for hadoop
---------------------------------------------------------:)*





On Wed, Sep 4, 2013 at 2:20 AM, Baskar Duraikannu <
baskar.duraikannu@outlook.com> wrote:

> Rahul
> Are you talking about rack-awareness script?
>
> I did go through rack awareness. Here are the problems with rack awareness
> w.r.to my (given) "business requirment"
>
> 1.  Hadoop , default places two copies on the same rack and 1 copy on some
> other rack.  This would work as long as we have two data centers. if
> business wants to have three data centers, then data would not be spread
> across. Separately there is a question around whether it is the right thing
> to do or not. I have been promised by business that they would buy enough
> bandwidth such that each data center will be few milliseconds apart (in
> latency).
>
> 2. I believe Hadoop automatically re-replicates data if one or more node
> is down. Assume when one out of 2 data center goes down. There will be a
> massive data flow to create additional copies.  When I say data center
> support, I should be able to configure hadoop to say
>      a) Maintain 1 copy per data center
>      b) If any data center goes down, dont create additional copies.
>
> Above requirements that I am pointing will essentially move hadoop from
> strongly consistent to a week/eventual consistent model. Since this changes
> fundamental architecture, it will probably break all sort of things...
> Might not be possible ever in Hadoop.
>
> Thoughts?
>
> Sadak
> Is there a way to implement above requirement via Federation?
>
> Thanks
> Baskar
>
>
> ------------------------------
> Date: Sun, 1 Sep 2013 00:20:04 +0530
>
> Subject: Re: Multidata center support
> From: visioner.sadak@gmail.com
> To: user@hadoop.apache.org
>
>
> What do you think friends I think hadoop clusters can run on multiple data
> centers using FEDERATION
>
>
> On Sat, Aug 31, 2013 at 8:39 PM, Visioner Sadak <vi...@gmail.com>wrote:
>
> The only problem i guess hadoop wont be able to duplicate data from one
> data center to another but i guess i can identify data nodes or namenodes
> from another data center correct me if i am wrong
>
>
> On Sat, Aug 31, 2013 at 7:00 PM, Visioner Sadak <vi...@gmail.com>wrote:
>
> lets say that
>
> you have some machines in europe and some  in US I think you just need the
> ips and configure them in your cluster set up
> it will work...
>
>
> On Sat, Aug 31, 2013 at 7:52 AM, Jun Ping Du <jd...@vmware.com> wrote:
>
> Hi,
>     Although you can set datacenter layer on your network topology, it is
> never enabled in hadoop as lacking of replica placement and task scheduling
> support. There are some work to add layers other than rack and node under
> HADOOP-8848 but may not suit for your case. Agree with Adam that a cluster
> spanning multiple data centers seems not make sense even for DR case. Do
> you have other cases to do such a deployment?
>
> Thanks,
>
> Junping
>
> ------------------------------
> *From: *"Adam Muise" <am...@hortonworks.com>
> *To: *user@hadoop.apache.org
> *Sent: *Friday, August 30, 2013 6:26:54 PM
> *Subject: *Re: Multidata center support
>
>
> Nothing has changed. DR best practice is still one (or more) clusters per
> site and replication is handled via distributed copy or some variation of
> it. A cluster spanning multiple data centers is a poor idea right now.
>
>
>
>
> On Fri, Aug 30, 2013 at 12:35 AM, Rahul Bhattacharjee <
> rahul.rec.dgp@gmail.com> wrote:
>
> My take on this.
>
> Why hadoop has to know about data center thing. I think it can be
> installed across multiple data centers , however topology configuration
> would be required to tell which node belongs to which data center and
> switch for block placement.
>
> Thanks,
> Rahul
>
>
> On Fri, Aug 30, 2013 at 12:42 AM, Baskar Duraikannu <
> baskar.duraikannu@outlook.com> wrote:
>
> We have a need to setup hadoop across data centers.  Does hadoop support
> multi data center configuration? I searched through archives and have found
> that hadoop did not support multi data center configuration some time back.
> Just wanted to see whether situation has changed.
>
> Please help.
>
>
>
>
>
> --
> *
> *
> *
> *
> *Adam Muise*
> Solution Engineer
> *Hortonworks*
> amuise@hortonworks.com
> 416-417-4037
>
> Hortonworks - Develops, Distributes and Supports Enterprise Apache Hadoop.<http://hortonworks.com/>
>
> Hortonworks Virtual Sandbox <http://hortonworks.com/sandbox>
>
> Hadoop: Disruptive Possibilities by Jeff Needham<http://hortonworks.com/resources/?did=72&cat=1>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>
>
>
>
>

Re: Multidata center support

Posted by Visioner Sadak <vi...@gmail.com>.
Hi friends

hello baskar i think rack awareness and data center awareness are different
and similarly nodes and data centers are different things from hadoops
perspective but ideally it shud be same i mean nodes can be in different
data centers right but i think hadoop doesnt not  replicate data across
data centers i am not sure abt this (can anyone please comment on this).....

federation can provide different namenodes so you can create independent
clusters for example one cluster at one data center
and another cluster at a different  data center..... but if hadoop can
replicate across data centers then we need only one federation cluster for
all data centers :)...are any of you guys using a single federation cluster
across multiple data centers in production  for example

CASE--1

one cluster federation/data centers at  US/Europe---------------(if hadoop
can replicate across data centers )

NN1 ------US       DN1 ----US
NN2 ------Europe DN2 -----Europe

In this case data can be replicated to DN1 and DN2

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
CASE--2

two independent cluster federation/data centers at
 US/Europe--------------(if hadoop cannot replicate across data centers )

cluster 1
cluster 2

NN1 ------US       DN1 ----US                                  NN2
------Europe DN2 -----Europe


In this case data cannot be replicated to DN2 or vice versa


*Can anyone clarify which will be the right and optimal case for hadoop
---------------------------------------------------------:)*





On Wed, Sep 4, 2013 at 2:20 AM, Baskar Duraikannu <
baskar.duraikannu@outlook.com> wrote:

> Rahul
> Are you talking about rack-awareness script?
>
> I did go through rack awareness. Here are the problems with rack awareness
> w.r.to my (given) "business requirment"
>
> 1.  Hadoop , default places two copies on the same rack and 1 copy on some
> other rack.  This would work as long as we have two data centers. if
> business wants to have three data centers, then data would not be spread
> across. Separately there is a question around whether it is the right thing
> to do or not. I have been promised by business that they would buy enough
> bandwidth such that each data center will be few milliseconds apart (in
> latency).
>
> 2. I believe Hadoop automatically re-replicates data if one or more node
> is down. Assume when one out of 2 data center goes down. There will be a
> massive data flow to create additional copies.  When I say data center
> support, I should be able to configure hadoop to say
>      a) Maintain 1 copy per data center
>      b) If any data center goes down, dont create additional copies.
>
> Above requirements that I am pointing will essentially move hadoop from
> strongly consistent to a week/eventual consistent model. Since this changes
> fundamental architecture, it will probably break all sort of things...
> Might not be possible ever in Hadoop.
>
> Thoughts?
>
> Sadak
> Is there a way to implement above requirement via Federation?
>
> Thanks
> Baskar
>
>
> ------------------------------
> Date: Sun, 1 Sep 2013 00:20:04 +0530
>
> Subject: Re: Multidata center support
> From: visioner.sadak@gmail.com
> To: user@hadoop.apache.org
>
>
> What do you think friends I think hadoop clusters can run on multiple data
> centers using FEDERATION
>
>
> On Sat, Aug 31, 2013 at 8:39 PM, Visioner Sadak <vi...@gmail.com>wrote:
>
> The only problem i guess hadoop wont be able to duplicate data from one
> data center to another but i guess i can identify data nodes or namenodes
> from another data center correct me if i am wrong
>
>
> On Sat, Aug 31, 2013 at 7:00 PM, Visioner Sadak <vi...@gmail.com>wrote:
>
> lets say that
>
> you have some machines in europe and some  in US I think you just need the
> ips and configure them in your cluster set up
> it will work...
>
>
> On Sat, Aug 31, 2013 at 7:52 AM, Jun Ping Du <jd...@vmware.com> wrote:
>
> Hi,
>     Although you can set datacenter layer on your network topology, it is
> never enabled in hadoop as lacking of replica placement and task scheduling
> support. There are some work to add layers other than rack and node under
> HADOOP-8848 but may not suit for your case. Agree with Adam that a cluster
> spanning multiple data centers seems not make sense even for DR case. Do
> you have other cases to do such a deployment?
>
> Thanks,
>
> Junping
>
> ------------------------------
> *From: *"Adam Muise" <am...@hortonworks.com>
> *To: *user@hadoop.apache.org
> *Sent: *Friday, August 30, 2013 6:26:54 PM
> *Subject: *Re: Multidata center support
>
>
> Nothing has changed. DR best practice is still one (or more) clusters per
> site and replication is handled via distributed copy or some variation of
> it. A cluster spanning multiple data centers is a poor idea right now.
>
>
>
>
> On Fri, Aug 30, 2013 at 12:35 AM, Rahul Bhattacharjee <
> rahul.rec.dgp@gmail.com> wrote:
>
> My take on this.
>
> Why hadoop has to know about data center thing. I think it can be
> installed across multiple data centers , however topology configuration
> would be required to tell which node belongs to which data center and
> switch for block placement.
>
> Thanks,
> Rahul
>
>
> On Fri, Aug 30, 2013 at 12:42 AM, Baskar Duraikannu <
> baskar.duraikannu@outlook.com> wrote:
>
> We have a need to setup hadoop across data centers.  Does hadoop support
> multi data center configuration? I searched through archives and have found
> that hadoop did not support multi data center configuration some time back.
> Just wanted to see whether situation has changed.
>
> Please help.
>
>
>
>
>
> --
> *
> *
> *
> *
> *Adam Muise*
> Solution Engineer
> *Hortonworks*
> amuise@hortonworks.com
> 416-417-4037
>
> Hortonworks - Develops, Distributes and Supports Enterprise Apache Hadoop.<http://hortonworks.com/>
>
> Hortonworks Virtual Sandbox <http://hortonworks.com/sandbox>
>
> Hadoop: Disruptive Possibilities by Jeff Needham<http://hortonworks.com/resources/?did=72&cat=1>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>
>
>
>
>

Re: Multidata center support

Posted by Visioner Sadak <vi...@gmail.com>.
Hi friends

hello baskar i think rack awareness and data center awareness are different
and similarly nodes and data centers are different things from hadoops
perspective but ideally it shud be same i mean nodes can be in different
data centers right but i think hadoop doesnt not  replicate data across
data centers i am not sure abt this (can anyone please comment on this).....

federation can provide different namenodes so you can create independent
clusters for example one cluster at one data center
and another cluster at a different  data center..... but if hadoop can
replicate across data centers then we need only one federation cluster for
all data centers :)...are any of you guys using a single federation cluster
across multiple data centers in production  for example

CASE--1

one cluster federation/data centers at  US/Europe---------------(if hadoop
can replicate across data centers )

NN1 ------US       DN1 ----US
NN2 ------Europe DN2 -----Europe

In this case data can be replicated to DN1 and DN2

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
CASE--2

two independent cluster federation/data centers at
 US/Europe--------------(if hadoop cannot replicate across data centers )

cluster 1
cluster 2

NN1 ------US       DN1 ----US                                  NN2
------Europe DN2 -----Europe


In this case data cannot be replicated to DN2 or vice versa


*Can anyone clarify which will be the right and optimal case for hadoop
---------------------------------------------------------:)*





On Wed, Sep 4, 2013 at 2:20 AM, Baskar Duraikannu <
baskar.duraikannu@outlook.com> wrote:

> Rahul
> Are you talking about rack-awareness script?
>
> I did go through rack awareness. Here are the problems with rack awareness
> w.r.to my (given) "business requirment"
>
> 1.  Hadoop , default places two copies on the same rack and 1 copy on some
> other rack.  This would work as long as we have two data centers. if
> business wants to have three data centers, then data would not be spread
> across. Separately there is a question around whether it is the right thing
> to do or not. I have been promised by business that they would buy enough
> bandwidth such that each data center will be few milliseconds apart (in
> latency).
>
> 2. I believe Hadoop automatically re-replicates data if one or more node
> is down. Assume when one out of 2 data center goes down. There will be a
> massive data flow to create additional copies.  When I say data center
> support, I should be able to configure hadoop to say
>      a) Maintain 1 copy per data center
>      b) If any data center goes down, dont create additional copies.
>
> Above requirements that I am pointing will essentially move hadoop from
> strongly consistent to a week/eventual consistent model. Since this changes
> fundamental architecture, it will probably break all sort of things...
> Might not be possible ever in Hadoop.
>
> Thoughts?
>
> Sadak
> Is there a way to implement above requirement via Federation?
>
> Thanks
> Baskar
>
>
> ------------------------------
> Date: Sun, 1 Sep 2013 00:20:04 +0530
>
> Subject: Re: Multidata center support
> From: visioner.sadak@gmail.com
> To: user@hadoop.apache.org
>
>
> What do you think friends I think hadoop clusters can run on multiple data
> centers using FEDERATION
>
>
> On Sat, Aug 31, 2013 at 8:39 PM, Visioner Sadak <vi...@gmail.com>wrote:
>
> The only problem i guess hadoop wont be able to duplicate data from one
> data center to another but i guess i can identify data nodes or namenodes
> from another data center correct me if i am wrong
>
>
> On Sat, Aug 31, 2013 at 7:00 PM, Visioner Sadak <vi...@gmail.com>wrote:
>
> lets say that
>
> you have some machines in europe and some  in US I think you just need the
> ips and configure them in your cluster set up
> it will work...
>
>
> On Sat, Aug 31, 2013 at 7:52 AM, Jun Ping Du <jd...@vmware.com> wrote:
>
> Hi,
>     Although you can set datacenter layer on your network topology, it is
> never enabled in hadoop as lacking of replica placement and task scheduling
> support. There are some work to add layers other than rack and node under
> HADOOP-8848 but may not suit for your case. Agree with Adam that a cluster
> spanning multiple data centers seems not make sense even for DR case. Do
> you have other cases to do such a deployment?
>
> Thanks,
>
> Junping
>
> ------------------------------
> *From: *"Adam Muise" <am...@hortonworks.com>
> *To: *user@hadoop.apache.org
> *Sent: *Friday, August 30, 2013 6:26:54 PM
> *Subject: *Re: Multidata center support
>
>
> Nothing has changed. DR best practice is still one (or more) clusters per
> site and replication is handled via distributed copy or some variation of
> it. A cluster spanning multiple data centers is a poor idea right now.
>
>
>
>
> On Fri, Aug 30, 2013 at 12:35 AM, Rahul Bhattacharjee <
> rahul.rec.dgp@gmail.com> wrote:
>
> My take on this.
>
> Why hadoop has to know about data center thing. I think it can be
> installed across multiple data centers , however topology configuration
> would be required to tell which node belongs to which data center and
> switch for block placement.
>
> Thanks,
> Rahul
>
>
> On Fri, Aug 30, 2013 at 12:42 AM, Baskar Duraikannu <
> baskar.duraikannu@outlook.com> wrote:
>
> We have a need to setup hadoop across data centers.  Does hadoop support
> multi data center configuration? I searched through archives and have found
> that hadoop did not support multi data center configuration some time back.
> Just wanted to see whether situation has changed.
>
> Please help.
>
>
>
>
>
> --
> *
> *
> *
> *
> *Adam Muise*
> Solution Engineer
> *Hortonworks*
> amuise@hortonworks.com
> 416-417-4037
>
> Hortonworks - Develops, Distributes and Supports Enterprise Apache Hadoop.<http://hortonworks.com/>
>
> Hortonworks Virtual Sandbox <http://hortonworks.com/sandbox>
>
> Hadoop: Disruptive Possibilities by Jeff Needham<http://hortonworks.com/resources/?did=72&cat=1>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>
>
>
>
>

RE: Multidata center support

Posted by Baskar Duraikannu <ba...@outlook.com>.
RahulAre you talking about rack-awareness script? 
I did go through rack awareness. Here are the problems with rack awareness w.r.to my (given) "business requirment"
1.  Hadoop , default places two copies on the same rack and 1 copy on some other rack.  This would work as long as we have two data centers. if business wants to have three data centers, then data would not be spread across. Separately there is a question around whether it is the right thing to do or not. I have been promised by business that they would buy enough bandwidth such that each data center will be few milliseconds apart (in latency).
2. I believe Hadoop automatically re-replicates data if one or more node is down. Assume when one out of 2 data center goes down. There will be a massive data flow to create additional copies.  When I say data center support, I should be able to configure hadoop to say      a) Maintain 1 copy per data center     b) If any data center goes down, dont create additional copies.  
Above requirements that I am pointing will essentially move hadoop from strongly consistent to a week/eventual consistent model. Since this changes fundamental architecture, it will probably break all sort of things... Might not be possible ever in Hadoop. 
Thoughts? 
SadakIs there a way to implement above requirement via Federation? 
ThanksBaskar

Date: Sun, 1 Sep 2013 00:20:04 +0530
Subject: Re: Multidata center support
From: visioner.sadak@gmail.com
To: user@hadoop.apache.org

What do you think friends I think hadoop clusters can run on multiple data centers using FEDERATION

On Sat, Aug 31, 2013 at 8:39 PM, Visioner Sadak <vi...@gmail.com> wrote:

The only problem i guess hadoop wont be able to duplicate data from one data center to another but i guess i can identify data nodes or namenodes from another data center correct me if i am wrong



On Sat, Aug 31, 2013 at 7:00 PM, Visioner Sadak <vi...@gmail.com> wrote:


lets say that 
you have some machines in europe and some  in US I think you just need the ips and configure them in your cluster set upit will work...



On Sat, Aug 31, 2013 at 7:52 AM, Jun Ping Du <jd...@vmware.com> wrote:



Hi,    Although you can set datacenter layer on your network topology, it is never enabled in hadoop as lacking of replica placement and task scheduling support. There are some work to add layers other than rack and node under HADOOP-8848 but may not suit for your case. Agree with Adam that a cluster spanning multiple data centers seems not make sense even for DR case. Do you have other cases to do such a deployment?



Thanks,
Junping
From: "Adam Muise" <am...@hortonworks.com>



To: user@hadoop.apache.org
Sent: Friday, August 30, 2013 6:26:54 PM
Subject: Re: Multidata center support


Nothing has changed. DR best practice is still one (or more) clusters per site and replication is handled via distributed copy or some variation of it. A cluster spanning multiple data centers is a poor idea right now.







On Fri, Aug 30, 2013 at 12:35 AM, Rahul Bhattacharjee <ra...@gmail.com> wrote:




My take on this.





Why hadoop has to know about data center thing. I think it can be installed across multiple data centers , however topology configuration would be required to tell which node belongs to which data center and switch for block placement.







Thanks,
Rahul


On Fri, Aug 30, 2013 at 12:42 AM, Baskar Duraikannu <ba...@outlook.com> wrote:









We have a need to setup hadoop across data centers.  Does hadoop support multi data center configuration? I searched through archives and have found that hadoop did not support multi data center configuration some time back. Just wanted to see whether situation has changed.






Please help. 		 	   		  




-- 






Adam MuiseSolution EngineerHortonworks



amuise@hortonworks.com416-417-4037




Hortonworks - Develops, Distributes and Supports Enterprise Apache Hadoop.





Hortonworks Virtual Sandbox





Hadoop: Disruptive Possibilities by Jeff Needham








CONFIDENTIALITY NOTICENOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.








 		 	   		  

RE: Multidata center support

Posted by Baskar Duraikannu <ba...@outlook.com>.
RahulAre you talking about rack-awareness script? 
I did go through rack awareness. Here are the problems with rack awareness w.r.to my (given) "business requirment"
1.  Hadoop , default places two copies on the same rack and 1 copy on some other rack.  This would work as long as we have two data centers. if business wants to have three data centers, then data would not be spread across. Separately there is a question around whether it is the right thing to do or not. I have been promised by business that they would buy enough bandwidth such that each data center will be few milliseconds apart (in latency).
2. I believe Hadoop automatically re-replicates data if one or more node is down. Assume when one out of 2 data center goes down. There will be a massive data flow to create additional copies.  When I say data center support, I should be able to configure hadoop to say      a) Maintain 1 copy per data center     b) If any data center goes down, dont create additional copies.  
Above requirements that I am pointing will essentially move hadoop from strongly consistent to a week/eventual consistent model. Since this changes fundamental architecture, it will probably break all sort of things... Might not be possible ever in Hadoop. 
Thoughts? 
SadakIs there a way to implement above requirement via Federation? 
ThanksBaskar

Date: Sun, 1 Sep 2013 00:20:04 +0530
Subject: Re: Multidata center support
From: visioner.sadak@gmail.com
To: user@hadoop.apache.org

What do you think friends I think hadoop clusters can run on multiple data centers using FEDERATION

On Sat, Aug 31, 2013 at 8:39 PM, Visioner Sadak <vi...@gmail.com> wrote:

The only problem i guess hadoop wont be able to duplicate data from one data center to another but i guess i can identify data nodes or namenodes from another data center correct me if i am wrong



On Sat, Aug 31, 2013 at 7:00 PM, Visioner Sadak <vi...@gmail.com> wrote:


lets say that 
you have some machines in europe and some  in US I think you just need the ips and configure them in your cluster set upit will work...



On Sat, Aug 31, 2013 at 7:52 AM, Jun Ping Du <jd...@vmware.com> wrote:



Hi,    Although you can set datacenter layer on your network topology, it is never enabled in hadoop as lacking of replica placement and task scheduling support. There are some work to add layers other than rack and node under HADOOP-8848 but may not suit for your case. Agree with Adam that a cluster spanning multiple data centers seems not make sense even for DR case. Do you have other cases to do such a deployment?



Thanks,
Junping
From: "Adam Muise" <am...@hortonworks.com>



To: user@hadoop.apache.org
Sent: Friday, August 30, 2013 6:26:54 PM
Subject: Re: Multidata center support


Nothing has changed. DR best practice is still one (or more) clusters per site and replication is handled via distributed copy or some variation of it. A cluster spanning multiple data centers is a poor idea right now.







On Fri, Aug 30, 2013 at 12:35 AM, Rahul Bhattacharjee <ra...@gmail.com> wrote:




My take on this.





Why hadoop has to know about data center thing. I think it can be installed across multiple data centers , however topology configuration would be required to tell which node belongs to which data center and switch for block placement.







Thanks,
Rahul


On Fri, Aug 30, 2013 at 12:42 AM, Baskar Duraikannu <ba...@outlook.com> wrote:









We have a need to setup hadoop across data centers.  Does hadoop support multi data center configuration? I searched through archives and have found that hadoop did not support multi data center configuration some time back. Just wanted to see whether situation has changed.






Please help. 		 	   		  




-- 






Adam MuiseSolution EngineerHortonworks



amuise@hortonworks.com416-417-4037




Hortonworks - Develops, Distributes and Supports Enterprise Apache Hadoop.





Hortonworks Virtual Sandbox





Hadoop: Disruptive Possibilities by Jeff Needham








CONFIDENTIALITY NOTICENOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.








 		 	   		  

RE: Multidata center support

Posted by Baskar Duraikannu <ba...@outlook.com>.
RahulAre you talking about rack-awareness script? 
I did go through rack awareness. Here are the problems with rack awareness w.r.to my (given) "business requirment"
1.  Hadoop , default places two copies on the same rack and 1 copy on some other rack.  This would work as long as we have two data centers. if business wants to have three data centers, then data would not be spread across. Separately there is a question around whether it is the right thing to do or not. I have been promised by business that they would buy enough bandwidth such that each data center will be few milliseconds apart (in latency).
2. I believe Hadoop automatically re-replicates data if one or more node is down. Assume when one out of 2 data center goes down. There will be a massive data flow to create additional copies.  When I say data center support, I should be able to configure hadoop to say      a) Maintain 1 copy per data center     b) If any data center goes down, dont create additional copies.  
Above requirements that I am pointing will essentially move hadoop from strongly consistent to a week/eventual consistent model. Since this changes fundamental architecture, it will probably break all sort of things... Might not be possible ever in Hadoop. 
Thoughts? 
SadakIs there a way to implement above requirement via Federation? 
ThanksBaskar

Date: Sun, 1 Sep 2013 00:20:04 +0530
Subject: Re: Multidata center support
From: visioner.sadak@gmail.com
To: user@hadoop.apache.org

What do you think friends I think hadoop clusters can run on multiple data centers using FEDERATION

On Sat, Aug 31, 2013 at 8:39 PM, Visioner Sadak <vi...@gmail.com> wrote:

The only problem i guess hadoop wont be able to duplicate data from one data center to another but i guess i can identify data nodes or namenodes from another data center correct me if i am wrong



On Sat, Aug 31, 2013 at 7:00 PM, Visioner Sadak <vi...@gmail.com> wrote:


lets say that 
you have some machines in europe and some  in US I think you just need the ips and configure them in your cluster set upit will work...



On Sat, Aug 31, 2013 at 7:52 AM, Jun Ping Du <jd...@vmware.com> wrote:



Hi,    Although you can set datacenter layer on your network topology, it is never enabled in hadoop as lacking of replica placement and task scheduling support. There are some work to add layers other than rack and node under HADOOP-8848 but may not suit for your case. Agree with Adam that a cluster spanning multiple data centers seems not make sense even for DR case. Do you have other cases to do such a deployment?



Thanks,
Junping
From: "Adam Muise" <am...@hortonworks.com>



To: user@hadoop.apache.org
Sent: Friday, August 30, 2013 6:26:54 PM
Subject: Re: Multidata center support


Nothing has changed. DR best practice is still one (or more) clusters per site and replication is handled via distributed copy or some variation of it. A cluster spanning multiple data centers is a poor idea right now.







On Fri, Aug 30, 2013 at 12:35 AM, Rahul Bhattacharjee <ra...@gmail.com> wrote:




My take on this.





Why hadoop has to know about data center thing. I think it can be installed across multiple data centers , however topology configuration would be required to tell which node belongs to which data center and switch for block placement.







Thanks,
Rahul


On Fri, Aug 30, 2013 at 12:42 AM, Baskar Duraikannu <ba...@outlook.com> wrote:









We have a need to setup hadoop across data centers.  Does hadoop support multi data center configuration? I searched through archives and have found that hadoop did not support multi data center configuration some time back. Just wanted to see whether situation has changed.






Please help. 		 	   		  




-- 






Adam MuiseSolution EngineerHortonworks



amuise@hortonworks.com416-417-4037




Hortonworks - Develops, Distributes and Supports Enterprise Apache Hadoop.





Hortonworks Virtual Sandbox





Hadoop: Disruptive Possibilities by Jeff Needham








CONFIDENTIALITY NOTICENOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.








 		 	   		  

RE: Multidata center support

Posted by Baskar Duraikannu <ba...@outlook.com>.
RahulAre you talking about rack-awareness script? 
I did go through rack awareness. Here are the problems with rack awareness w.r.to my (given) "business requirment"
1.  Hadoop , default places two copies on the same rack and 1 copy on some other rack.  This would work as long as we have two data centers. if business wants to have three data centers, then data would not be spread across. Separately there is a question around whether it is the right thing to do or not. I have been promised by business that they would buy enough bandwidth such that each data center will be few milliseconds apart (in latency).
2. I believe Hadoop automatically re-replicates data if one or more node is down. Assume when one out of 2 data center goes down. There will be a massive data flow to create additional copies.  When I say data center support, I should be able to configure hadoop to say      a) Maintain 1 copy per data center     b) If any data center goes down, dont create additional copies.  
Above requirements that I am pointing will essentially move hadoop from strongly consistent to a week/eventual consistent model. Since this changes fundamental architecture, it will probably break all sort of things... Might not be possible ever in Hadoop. 
Thoughts? 
SadakIs there a way to implement above requirement via Federation? 
ThanksBaskar

Date: Sun, 1 Sep 2013 00:20:04 +0530
Subject: Re: Multidata center support
From: visioner.sadak@gmail.com
To: user@hadoop.apache.org

What do you think friends I think hadoop clusters can run on multiple data centers using FEDERATION

On Sat, Aug 31, 2013 at 8:39 PM, Visioner Sadak <vi...@gmail.com> wrote:

The only problem i guess hadoop wont be able to duplicate data from one data center to another but i guess i can identify data nodes or namenodes from another data center correct me if i am wrong



On Sat, Aug 31, 2013 at 7:00 PM, Visioner Sadak <vi...@gmail.com> wrote:


lets say that 
you have some machines in europe and some  in US I think you just need the ips and configure them in your cluster set upit will work...



On Sat, Aug 31, 2013 at 7:52 AM, Jun Ping Du <jd...@vmware.com> wrote:



Hi,    Although you can set datacenter layer on your network topology, it is never enabled in hadoop as lacking of replica placement and task scheduling support. There are some work to add layers other than rack and node under HADOOP-8848 but may not suit for your case. Agree with Adam that a cluster spanning multiple data centers seems not make sense even for DR case. Do you have other cases to do such a deployment?



Thanks,
Junping
From: "Adam Muise" <am...@hortonworks.com>



To: user@hadoop.apache.org
Sent: Friday, August 30, 2013 6:26:54 PM
Subject: Re: Multidata center support


Nothing has changed. DR best practice is still one (or more) clusters per site and replication is handled via distributed copy or some variation of it. A cluster spanning multiple data centers is a poor idea right now.







On Fri, Aug 30, 2013 at 12:35 AM, Rahul Bhattacharjee <ra...@gmail.com> wrote:




My take on this.





Why hadoop has to know about data center thing. I think it can be installed across multiple data centers , however topology configuration would be required to tell which node belongs to which data center and switch for block placement.







Thanks,
Rahul


On Fri, Aug 30, 2013 at 12:42 AM, Baskar Duraikannu <ba...@outlook.com> wrote:









We have a need to setup hadoop across data centers.  Does hadoop support multi data center configuration? I searched through archives and have found that hadoop did not support multi data center configuration some time back. Just wanted to see whether situation has changed.






Please help. 		 	   		  




-- 






Adam MuiseSolution EngineerHortonworks



amuise@hortonworks.com416-417-4037




Hortonworks - Develops, Distributes and Supports Enterprise Apache Hadoop.





Hortonworks Virtual Sandbox





Hadoop: Disruptive Possibilities by Jeff Needham








CONFIDENTIALITY NOTICENOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.








 		 	   		  

Re: Multidata center support

Posted by Visioner Sadak <vi...@gmail.com>.
What do you think friends I think hadoop clusters can run on multiple data
centers using FEDERATION


On Sat, Aug 31, 2013 at 8:39 PM, Visioner Sadak <vi...@gmail.com>wrote:

> The only problem i guess hadoop wont be able to duplicate data from one
> data center to another but i guess i can identify data nodes or namenodes
> from another data center correct me if i am wrong
>
>
> On Sat, Aug 31, 2013 at 7:00 PM, Visioner Sadak <vi...@gmail.com>wrote:
>
>> lets say that
>>
>> you have some machines in europe and some  in US I think you just need
>> the ips and configure them in your cluster set up
>> it will work...
>>
>>
>> On Sat, Aug 31, 2013 at 7:52 AM, Jun Ping Du <jd...@vmware.com> wrote:
>>
>>> Hi,
>>>     Although you can set datacenter layer on your network topology, it
>>> is never enabled in hadoop as lacking of replica placement and task
>>> scheduling support. There are some work to add layers other than rack and
>>> node under HADOOP-8848 but may not suit for your case. Agree with Adam that
>>> a cluster spanning multiple data centers seems not make sense even for DR
>>> case. Do you have other cases to do such a deployment?
>>>
>>> Thanks,
>>>
>>> Junping
>>>
>>> ------------------------------
>>> *From: *"Adam Muise" <am...@hortonworks.com>
>>> *To: *user@hadoop.apache.org
>>> *Sent: *Friday, August 30, 2013 6:26:54 PM
>>> *Subject: *Re: Multidata center support
>>>
>>>
>>> Nothing has changed. DR best practice is still one (or more) clusters
>>> per site and replication is handled via distributed copy or some variation
>>> of it. A cluster spanning multiple data centers is a poor idea right now.
>>>
>>>
>>>
>>>
>>> On Fri, Aug 30, 2013 at 12:35 AM, Rahul Bhattacharjee <
>>> rahul.rec.dgp@gmail.com> wrote:
>>>
>>>> My take on this.
>>>>
>>>> Why hadoop has to know about data center thing. I think it can be
>>>> installed across multiple data centers , however topology configuration
>>>> would be required to tell which node belongs to which data center and
>>>> switch for block placement.
>>>>
>>>> Thanks,
>>>> Rahul
>>>>
>>>>
>>>> On Fri, Aug 30, 2013 at 12:42 AM, Baskar Duraikannu <
>>>> baskar.duraikannu@outlook.com> wrote:
>>>>
>>>>> We have a need to setup hadoop across data centers.  Does hadoop
>>>>> support multi data center configuration? I searched through archives and
>>>>> have found that hadoop did not support multi data center configuration some
>>>>> time back. Just wanted to see whether situation has changed.
>>>>>
>>>>> Please help.
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> *
>>> *
>>> *
>>> *
>>> *Adam Muise*
>>> Solution Engineer
>>> *Hortonworks*
>>> amuise@hortonworks.com
>>> 416-417-4037
>>>
>>> Hortonworks - Develops, Distributes and Supports Enterprise Apache
>>> Hadoop. <http://hortonworks.com/>
>>>
>>> Hortonworks Virtual Sandbox <http://hortonworks.com/sandbox>
>>>
>>> Hadoop: Disruptive Possibilities by Jeff Needham<http://hortonworks.com/resources/?did=72&cat=1>
>>>
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or entity
>>> to which it is addressed and may contain information that is confidential,
>>> privileged and exempt from disclosure under applicable law. If the reader
>>> of this message is not the intended recipient, you are hereby notified that
>>> any printing, copying, dissemination, distribution, disclosure or
>>> forwarding of this communication is strictly prohibited. If you have
>>> received this communication in error, please contact the sender immediately
>>> and delete it from your system. Thank You.
>>>
>>>
>>
>

Re: Multidata center support

Posted by Visioner Sadak <vi...@gmail.com>.
What do you think friends I think hadoop clusters can run on multiple data
centers using FEDERATION


On Sat, Aug 31, 2013 at 8:39 PM, Visioner Sadak <vi...@gmail.com>wrote:

> The only problem i guess hadoop wont be able to duplicate data from one
> data center to another but i guess i can identify data nodes or namenodes
> from another data center correct me if i am wrong
>
>
> On Sat, Aug 31, 2013 at 7:00 PM, Visioner Sadak <vi...@gmail.com>wrote:
>
>> lets say that
>>
>> you have some machines in europe and some  in US I think you just need
>> the ips and configure them in your cluster set up
>> it will work...
>>
>>
>> On Sat, Aug 31, 2013 at 7:52 AM, Jun Ping Du <jd...@vmware.com> wrote:
>>
>>> Hi,
>>>     Although you can set datacenter layer on your network topology, it
>>> is never enabled in hadoop as lacking of replica placement and task
>>> scheduling support. There are some work to add layers other than rack and
>>> node under HADOOP-8848 but may not suit for your case. Agree with Adam that
>>> a cluster spanning multiple data centers seems not make sense even for DR
>>> case. Do you have other cases to do such a deployment?
>>>
>>> Thanks,
>>>
>>> Junping
>>>
>>> ------------------------------
>>> *From: *"Adam Muise" <am...@hortonworks.com>
>>> *To: *user@hadoop.apache.org
>>> *Sent: *Friday, August 30, 2013 6:26:54 PM
>>> *Subject: *Re: Multidata center support
>>>
>>>
>>> Nothing has changed. DR best practice is still one (or more) clusters
>>> per site and replication is handled via distributed copy or some variation
>>> of it. A cluster spanning multiple data centers is a poor idea right now.
>>>
>>>
>>>
>>>
>>> On Fri, Aug 30, 2013 at 12:35 AM, Rahul Bhattacharjee <
>>> rahul.rec.dgp@gmail.com> wrote:
>>>
>>>> My take on this.
>>>>
>>>> Why hadoop has to know about data center thing. I think it can be
>>>> installed across multiple data centers , however topology configuration
>>>> would be required to tell which node belongs to which data center and
>>>> switch for block placement.
>>>>
>>>> Thanks,
>>>> Rahul
>>>>
>>>>
>>>> On Fri, Aug 30, 2013 at 12:42 AM, Baskar Duraikannu <
>>>> baskar.duraikannu@outlook.com> wrote:
>>>>
>>>>> We have a need to setup hadoop across data centers.  Does hadoop
>>>>> support multi data center configuration? I searched through archives and
>>>>> have found that hadoop did not support multi data center configuration some
>>>>> time back. Just wanted to see whether situation has changed.
>>>>>
>>>>> Please help.
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> *
>>> *
>>> *
>>> *
>>> *Adam Muise*
>>> Solution Engineer
>>> *Hortonworks*
>>> amuise@hortonworks.com
>>> 416-417-4037
>>>
>>> Hortonworks - Develops, Distributes and Supports Enterprise Apache
>>> Hadoop. <http://hortonworks.com/>
>>>
>>> Hortonworks Virtual Sandbox <http://hortonworks.com/sandbox>
>>>
>>> Hadoop: Disruptive Possibilities by Jeff Needham<http://hortonworks.com/resources/?did=72&cat=1>
>>>
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or entity
>>> to which it is addressed and may contain information that is confidential,
>>> privileged and exempt from disclosure under applicable law. If the reader
>>> of this message is not the intended recipient, you are hereby notified that
>>> any printing, copying, dissemination, distribution, disclosure or
>>> forwarding of this communication is strictly prohibited. If you have
>>> received this communication in error, please contact the sender immediately
>>> and delete it from your system. Thank You.
>>>
>>>
>>
>

Re: Multidata center support

Posted by Visioner Sadak <vi...@gmail.com>.
What do you think friends I think hadoop clusters can run on multiple data
centers using FEDERATION


On Sat, Aug 31, 2013 at 8:39 PM, Visioner Sadak <vi...@gmail.com>wrote:

> The only problem i guess hadoop wont be able to duplicate data from one
> data center to another but i guess i can identify data nodes or namenodes
> from another data center correct me if i am wrong
>
>
> On Sat, Aug 31, 2013 at 7:00 PM, Visioner Sadak <vi...@gmail.com>wrote:
>
>> lets say that
>>
>> you have some machines in europe and some  in US I think you just need
>> the ips and configure them in your cluster set up
>> it will work...
>>
>>
>> On Sat, Aug 31, 2013 at 7:52 AM, Jun Ping Du <jd...@vmware.com> wrote:
>>
>>> Hi,
>>>     Although you can set datacenter layer on your network topology, it
>>> is never enabled in hadoop as lacking of replica placement and task
>>> scheduling support. There are some work to add layers other than rack and
>>> node under HADOOP-8848 but may not suit for your case. Agree with Adam that
>>> a cluster spanning multiple data centers seems not make sense even for DR
>>> case. Do you have other cases to do such a deployment?
>>>
>>> Thanks,
>>>
>>> Junping
>>>
>>> ------------------------------
>>> *From: *"Adam Muise" <am...@hortonworks.com>
>>> *To: *user@hadoop.apache.org
>>> *Sent: *Friday, August 30, 2013 6:26:54 PM
>>> *Subject: *Re: Multidata center support
>>>
>>>
>>> Nothing has changed. DR best practice is still one (or more) clusters
>>> per site and replication is handled via distributed copy or some variation
>>> of it. A cluster spanning multiple data centers is a poor idea right now.
>>>
>>>
>>>
>>>
>>> On Fri, Aug 30, 2013 at 12:35 AM, Rahul Bhattacharjee <
>>> rahul.rec.dgp@gmail.com> wrote:
>>>
>>>> My take on this.
>>>>
>>>> Why hadoop has to know about data center thing. I think it can be
>>>> installed across multiple data centers , however topology configuration
>>>> would be required to tell which node belongs to which data center and
>>>> switch for block placement.
>>>>
>>>> Thanks,
>>>> Rahul
>>>>
>>>>
>>>> On Fri, Aug 30, 2013 at 12:42 AM, Baskar Duraikannu <
>>>> baskar.duraikannu@outlook.com> wrote:
>>>>
>>>>> We have a need to setup hadoop across data centers.  Does hadoop
>>>>> support multi data center configuration? I searched through archives and
>>>>> have found that hadoop did not support multi data center configuration some
>>>>> time back. Just wanted to see whether situation has changed.
>>>>>
>>>>> Please help.
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> *
>>> *
>>> *
>>> *
>>> *Adam Muise*
>>> Solution Engineer
>>> *Hortonworks*
>>> amuise@hortonworks.com
>>> 416-417-4037
>>>
>>> Hortonworks - Develops, Distributes and Supports Enterprise Apache
>>> Hadoop. <http://hortonworks.com/>
>>>
>>> Hortonworks Virtual Sandbox <http://hortonworks.com/sandbox>
>>>
>>> Hadoop: Disruptive Possibilities by Jeff Needham<http://hortonworks.com/resources/?did=72&cat=1>
>>>
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or entity
>>> to which it is addressed and may contain information that is confidential,
>>> privileged and exempt from disclosure under applicable law. If the reader
>>> of this message is not the intended recipient, you are hereby notified that
>>> any printing, copying, dissemination, distribution, disclosure or
>>> forwarding of this communication is strictly prohibited. If you have
>>> received this communication in error, please contact the sender immediately
>>> and delete it from your system. Thank You.
>>>
>>>
>>
>

Re: Multidata center support

Posted by Visioner Sadak <vi...@gmail.com>.
What do you think friends I think hadoop clusters can run on multiple data
centers using FEDERATION


On Sat, Aug 31, 2013 at 8:39 PM, Visioner Sadak <vi...@gmail.com>wrote:

> The only problem i guess hadoop wont be able to duplicate data from one
> data center to another but i guess i can identify data nodes or namenodes
> from another data center correct me if i am wrong
>
>
> On Sat, Aug 31, 2013 at 7:00 PM, Visioner Sadak <vi...@gmail.com>wrote:
>
>> lets say that
>>
>> you have some machines in europe and some  in US I think you just need
>> the ips and configure them in your cluster set up
>> it will work...
>>
>>
>> On Sat, Aug 31, 2013 at 7:52 AM, Jun Ping Du <jd...@vmware.com> wrote:
>>
>>> Hi,
>>>     Although you can set datacenter layer on your network topology, it
>>> is never enabled in hadoop as lacking of replica placement and task
>>> scheduling support. There are some work to add layers other than rack and
>>> node under HADOOP-8848 but may not suit for your case. Agree with Adam that
>>> a cluster spanning multiple data centers seems not make sense even for DR
>>> case. Do you have other cases to do such a deployment?
>>>
>>> Thanks,
>>>
>>> Junping
>>>
>>> ------------------------------
>>> *From: *"Adam Muise" <am...@hortonworks.com>
>>> *To: *user@hadoop.apache.org
>>> *Sent: *Friday, August 30, 2013 6:26:54 PM
>>> *Subject: *Re: Multidata center support
>>>
>>>
>>> Nothing has changed. DR best practice is still one (or more) clusters
>>> per site and replication is handled via distributed copy or some variation
>>> of it. A cluster spanning multiple data centers is a poor idea right now.
>>>
>>>
>>>
>>>
>>> On Fri, Aug 30, 2013 at 12:35 AM, Rahul Bhattacharjee <
>>> rahul.rec.dgp@gmail.com> wrote:
>>>
>>>> My take on this.
>>>>
>>>> Why hadoop has to know about data center thing. I think it can be
>>>> installed across multiple data centers , however topology configuration
>>>> would be required to tell which node belongs to which data center and
>>>> switch for block placement.
>>>>
>>>> Thanks,
>>>> Rahul
>>>>
>>>>
>>>> On Fri, Aug 30, 2013 at 12:42 AM, Baskar Duraikannu <
>>>> baskar.duraikannu@outlook.com> wrote:
>>>>
>>>>> We have a need to setup hadoop across data centers.  Does hadoop
>>>>> support multi data center configuration? I searched through archives and
>>>>> have found that hadoop did not support multi data center configuration some
>>>>> time back. Just wanted to see whether situation has changed.
>>>>>
>>>>> Please help.
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> *
>>> *
>>> *
>>> *
>>> *Adam Muise*
>>> Solution Engineer
>>> *Hortonworks*
>>> amuise@hortonworks.com
>>> 416-417-4037
>>>
>>> Hortonworks - Develops, Distributes and Supports Enterprise Apache
>>> Hadoop. <http://hortonworks.com/>
>>>
>>> Hortonworks Virtual Sandbox <http://hortonworks.com/sandbox>
>>>
>>> Hadoop: Disruptive Possibilities by Jeff Needham<http://hortonworks.com/resources/?did=72&cat=1>
>>>
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or entity
>>> to which it is addressed and may contain information that is confidential,
>>> privileged and exempt from disclosure under applicable law. If the reader
>>> of this message is not the intended recipient, you are hereby notified that
>>> any printing, copying, dissemination, distribution, disclosure or
>>> forwarding of this communication is strictly prohibited. If you have
>>> received this communication in error, please contact the sender immediately
>>> and delete it from your system. Thank You.
>>>
>>>
>>
>

Re: Multidata center support

Posted by Visioner Sadak <vi...@gmail.com>.
The only problem i guess hadoop wont be able to duplicate data from one
data center to another but i guess i can identify data nodes or namenodes
from another data center correct me if i am wrong


On Sat, Aug 31, 2013 at 7:00 PM, Visioner Sadak <vi...@gmail.com>wrote:

> lets say that
>
> you have some machines in europe and some  in US I think you just need the
> ips and configure them in your cluster set up
> it will work...
>
>
> On Sat, Aug 31, 2013 at 7:52 AM, Jun Ping Du <jd...@vmware.com> wrote:
>
>> Hi,
>>     Although you can set datacenter layer on your network topology, it is
>> never enabled in hadoop as lacking of replica placement and task scheduling
>> support. There are some work to add layers other than rack and node under
>> HADOOP-8848 but may not suit for your case. Agree with Adam that a cluster
>> spanning multiple data centers seems not make sense even for DR case. Do
>> you have other cases to do such a deployment?
>>
>> Thanks,
>>
>> Junping
>>
>> ------------------------------
>> *From: *"Adam Muise" <am...@hortonworks.com>
>> *To: *user@hadoop.apache.org
>> *Sent: *Friday, August 30, 2013 6:26:54 PM
>> *Subject: *Re: Multidata center support
>>
>>
>> Nothing has changed. DR best practice is still one (or more) clusters per
>> site and replication is handled via distributed copy or some variation of
>> it. A cluster spanning multiple data centers is a poor idea right now.
>>
>>
>>
>>
>> On Fri, Aug 30, 2013 at 12:35 AM, Rahul Bhattacharjee <
>> rahul.rec.dgp@gmail.com> wrote:
>>
>>> My take on this.
>>>
>>> Why hadoop has to know about data center thing. I think it can be
>>> installed across multiple data centers , however topology configuration
>>> would be required to tell which node belongs to which data center and
>>> switch for block placement.
>>>
>>> Thanks,
>>> Rahul
>>>
>>>
>>> On Fri, Aug 30, 2013 at 12:42 AM, Baskar Duraikannu <
>>> baskar.duraikannu@outlook.com> wrote:
>>>
>>>> We have a need to setup hadoop across data centers.  Does hadoop
>>>> support multi data center configuration? I searched through archives and
>>>> have found that hadoop did not support multi data center configuration some
>>>> time back. Just wanted to see whether situation has changed.
>>>>
>>>> Please help.
>>>>
>>>
>>>
>>
>>
>> --
>> *
>> *
>> *
>> *
>> *Adam Muise*
>> Solution Engineer
>> *Hortonworks*
>> amuise@hortonworks.com
>> 416-417-4037
>>
>> Hortonworks - Develops, Distributes and Supports Enterprise Apache Hadoop.<http://hortonworks.com/>
>>
>> Hortonworks Virtual Sandbox <http://hortonworks.com/sandbox>
>>
>> Hadoop: Disruptive Possibilities by Jeff Needham<http://hortonworks.com/resources/?did=72&cat=1>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>>
>>
>

Re: Multidata center support

Posted by Visioner Sadak <vi...@gmail.com>.
The only problem i guess hadoop wont be able to duplicate data from one
data center to another but i guess i can identify data nodes or namenodes
from another data center correct me if i am wrong


On Sat, Aug 31, 2013 at 7:00 PM, Visioner Sadak <vi...@gmail.com>wrote:

> lets say that
>
> you have some machines in europe and some  in US I think you just need the
> ips and configure them in your cluster set up
> it will work...
>
>
> On Sat, Aug 31, 2013 at 7:52 AM, Jun Ping Du <jd...@vmware.com> wrote:
>
>> Hi,
>>     Although you can set datacenter layer on your network topology, it is
>> never enabled in hadoop as lacking of replica placement and task scheduling
>> support. There are some work to add layers other than rack and node under
>> HADOOP-8848 but may not suit for your case. Agree with Adam that a cluster
>> spanning multiple data centers seems not make sense even for DR case. Do
>> you have other cases to do such a deployment?
>>
>> Thanks,
>>
>> Junping
>>
>> ------------------------------
>> *From: *"Adam Muise" <am...@hortonworks.com>
>> *To: *user@hadoop.apache.org
>> *Sent: *Friday, August 30, 2013 6:26:54 PM
>> *Subject: *Re: Multidata center support
>>
>>
>> Nothing has changed. DR best practice is still one (or more) clusters per
>> site and replication is handled via distributed copy or some variation of
>> it. A cluster spanning multiple data centers is a poor idea right now.
>>
>>
>>
>>
>> On Fri, Aug 30, 2013 at 12:35 AM, Rahul Bhattacharjee <
>> rahul.rec.dgp@gmail.com> wrote:
>>
>>> My take on this.
>>>
>>> Why hadoop has to know about data center thing. I think it can be
>>> installed across multiple data centers , however topology configuration
>>> would be required to tell which node belongs to which data center and
>>> switch for block placement.
>>>
>>> Thanks,
>>> Rahul
>>>
>>>
>>> On Fri, Aug 30, 2013 at 12:42 AM, Baskar Duraikannu <
>>> baskar.duraikannu@outlook.com> wrote:
>>>
>>>> We have a need to setup hadoop across data centers.  Does hadoop
>>>> support multi data center configuration? I searched through archives and
>>>> have found that hadoop did not support multi data center configuration some
>>>> time back. Just wanted to see whether situation has changed.
>>>>
>>>> Please help.
>>>>
>>>
>>>
>>
>>
>> --
>> *
>> *
>> *
>> *
>> *Adam Muise*
>> Solution Engineer
>> *Hortonworks*
>> amuise@hortonworks.com
>> 416-417-4037
>>
>> Hortonworks - Develops, Distributes and Supports Enterprise Apache Hadoop.<http://hortonworks.com/>
>>
>> Hortonworks Virtual Sandbox <http://hortonworks.com/sandbox>
>>
>> Hadoop: Disruptive Possibilities by Jeff Needham<http://hortonworks.com/resources/?did=72&cat=1>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>>
>>
>

Re: Multidata center support

Posted by Visioner Sadak <vi...@gmail.com>.
The only problem i guess hadoop wont be able to duplicate data from one
data center to another but i guess i can identify data nodes or namenodes
from another data center correct me if i am wrong


On Sat, Aug 31, 2013 at 7:00 PM, Visioner Sadak <vi...@gmail.com>wrote:

> lets say that
>
> you have some machines in europe and some  in US I think you just need the
> ips and configure them in your cluster set up
> it will work...
>
>
> On Sat, Aug 31, 2013 at 7:52 AM, Jun Ping Du <jd...@vmware.com> wrote:
>
>> Hi,
>>     Although you can set datacenter layer on your network topology, it is
>> never enabled in hadoop as lacking of replica placement and task scheduling
>> support. There are some work to add layers other than rack and node under
>> HADOOP-8848 but may not suit for your case. Agree with Adam that a cluster
>> spanning multiple data centers seems not make sense even for DR case. Do
>> you have other cases to do such a deployment?
>>
>> Thanks,
>>
>> Junping
>>
>> ------------------------------
>> *From: *"Adam Muise" <am...@hortonworks.com>
>> *To: *user@hadoop.apache.org
>> *Sent: *Friday, August 30, 2013 6:26:54 PM
>> *Subject: *Re: Multidata center support
>>
>>
>> Nothing has changed. DR best practice is still one (or more) clusters per
>> site and replication is handled via distributed copy or some variation of
>> it. A cluster spanning multiple data centers is a poor idea right now.
>>
>>
>>
>>
>> On Fri, Aug 30, 2013 at 12:35 AM, Rahul Bhattacharjee <
>> rahul.rec.dgp@gmail.com> wrote:
>>
>>> My take on this.
>>>
>>> Why hadoop has to know about data center thing. I think it can be
>>> installed across multiple data centers , however topology configuration
>>> would be required to tell which node belongs to which data center and
>>> switch for block placement.
>>>
>>> Thanks,
>>> Rahul
>>>
>>>
>>> On Fri, Aug 30, 2013 at 12:42 AM, Baskar Duraikannu <
>>> baskar.duraikannu@outlook.com> wrote:
>>>
>>>> We have a need to setup hadoop across data centers.  Does hadoop
>>>> support multi data center configuration? I searched through archives and
>>>> have found that hadoop did not support multi data center configuration some
>>>> time back. Just wanted to see whether situation has changed.
>>>>
>>>> Please help.
>>>>
>>>
>>>
>>
>>
>> --
>> *
>> *
>> *
>> *
>> *Adam Muise*
>> Solution Engineer
>> *Hortonworks*
>> amuise@hortonworks.com
>> 416-417-4037
>>
>> Hortonworks - Develops, Distributes and Supports Enterprise Apache Hadoop.<http://hortonworks.com/>
>>
>> Hortonworks Virtual Sandbox <http://hortonworks.com/sandbox>
>>
>> Hadoop: Disruptive Possibilities by Jeff Needham<http://hortonworks.com/resources/?did=72&cat=1>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>>
>>
>

Re: Multidata center support

Posted by Visioner Sadak <vi...@gmail.com>.
The only problem i guess hadoop wont be able to duplicate data from one
data center to another but i guess i can identify data nodes or namenodes
from another data center correct me if i am wrong


On Sat, Aug 31, 2013 at 7:00 PM, Visioner Sadak <vi...@gmail.com>wrote:

> lets say that
>
> you have some machines in europe and some  in US I think you just need the
> ips and configure them in your cluster set up
> it will work...
>
>
> On Sat, Aug 31, 2013 at 7:52 AM, Jun Ping Du <jd...@vmware.com> wrote:
>
>> Hi,
>>     Although you can set datacenter layer on your network topology, it is
>> never enabled in hadoop as lacking of replica placement and task scheduling
>> support. There are some work to add layers other than rack and node under
>> HADOOP-8848 but may not suit for your case. Agree with Adam that a cluster
>> spanning multiple data centers seems not make sense even for DR case. Do
>> you have other cases to do such a deployment?
>>
>> Thanks,
>>
>> Junping
>>
>> ------------------------------
>> *From: *"Adam Muise" <am...@hortonworks.com>
>> *To: *user@hadoop.apache.org
>> *Sent: *Friday, August 30, 2013 6:26:54 PM
>> *Subject: *Re: Multidata center support
>>
>>
>> Nothing has changed. DR best practice is still one (or more) clusters per
>> site and replication is handled via distributed copy or some variation of
>> it. A cluster spanning multiple data centers is a poor idea right now.
>>
>>
>>
>>
>> On Fri, Aug 30, 2013 at 12:35 AM, Rahul Bhattacharjee <
>> rahul.rec.dgp@gmail.com> wrote:
>>
>>> My take on this.
>>>
>>> Why hadoop has to know about data center thing. I think it can be
>>> installed across multiple data centers , however topology configuration
>>> would be required to tell which node belongs to which data center and
>>> switch for block placement.
>>>
>>> Thanks,
>>> Rahul
>>>
>>>
>>> On Fri, Aug 30, 2013 at 12:42 AM, Baskar Duraikannu <
>>> baskar.duraikannu@outlook.com> wrote:
>>>
>>>> We have a need to setup hadoop across data centers.  Does hadoop
>>>> support multi data center configuration? I searched through archives and
>>>> have found that hadoop did not support multi data center configuration some
>>>> time back. Just wanted to see whether situation has changed.
>>>>
>>>> Please help.
>>>>
>>>
>>>
>>
>>
>> --
>> *
>> *
>> *
>> *
>> *Adam Muise*
>> Solution Engineer
>> *Hortonworks*
>> amuise@hortonworks.com
>> 416-417-4037
>>
>> Hortonworks - Develops, Distributes and Supports Enterprise Apache Hadoop.<http://hortonworks.com/>
>>
>> Hortonworks Virtual Sandbox <http://hortonworks.com/sandbox>
>>
>> Hadoop: Disruptive Possibilities by Jeff Needham<http://hortonworks.com/resources/?did=72&cat=1>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>>
>>
>

Re: Multidata center support

Posted by Visioner Sadak <vi...@gmail.com>.
lets say that

you have some machines in europe and some  in US I think you just need the
ips and configure them in your cluster set up
it will work...


On Sat, Aug 31, 2013 at 7:52 AM, Jun Ping Du <jd...@vmware.com> wrote:

> Hi,
>     Although you can set datacenter layer on your network topology, it is
> never enabled in hadoop as lacking of replica placement and task scheduling
> support. There are some work to add layers other than rack and node under
> HADOOP-8848 but may not suit for your case. Agree with Adam that a cluster
> spanning multiple data centers seems not make sense even for DR case. Do
> you have other cases to do such a deployment?
>
> Thanks,
>
> Junping
>
> ------------------------------
> *From: *"Adam Muise" <am...@hortonworks.com>
> *To: *user@hadoop.apache.org
> *Sent: *Friday, August 30, 2013 6:26:54 PM
> *Subject: *Re: Multidata center support
>
>
> Nothing has changed. DR best practice is still one (or more) clusters per
> site and replication is handled via distributed copy or some variation of
> it. A cluster spanning multiple data centers is a poor idea right now.
>
>
>
>
> On Fri, Aug 30, 2013 at 12:35 AM, Rahul Bhattacharjee <
> rahul.rec.dgp@gmail.com> wrote:
>
>> My take on this.
>>
>> Why hadoop has to know about data center thing. I think it can be
>> installed across multiple data centers , however topology configuration
>> would be required to tell which node belongs to which data center and
>> switch for block placement.
>>
>> Thanks,
>> Rahul
>>
>>
>> On Fri, Aug 30, 2013 at 12:42 AM, Baskar Duraikannu <
>> baskar.duraikannu@outlook.com> wrote:
>>
>>> We have a need to setup hadoop across data centers.  Does hadoop support
>>> multi data center configuration? I searched through archives and have found
>>> that hadoop did not support multi data center configuration some time back.
>>> Just wanted to see whether situation has changed.
>>>
>>> Please help.
>>>
>>
>>
>
>
> --
> *
> *
> *
> *
> *Adam Muise*
> Solution Engineer
> *Hortonworks*
> amuise@hortonworks.com
> 416-417-4037
>
> Hortonworks - Develops, Distributes and Supports Enterprise Apache Hadoop.<http://hortonworks.com/>
>
> Hortonworks Virtual Sandbox <http://hortonworks.com/sandbox>
>
> Hadoop: Disruptive Possibilities by Jeff Needham<http://hortonworks.com/resources/?did=72&cat=1>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>
>

Re: Multidata center support

Posted by Visioner Sadak <vi...@gmail.com>.
lets say that

you have some machines in europe and some  in US I think you just need the
ips and configure them in your cluster set up
it will work...


On Sat, Aug 31, 2013 at 7:52 AM, Jun Ping Du <jd...@vmware.com> wrote:

> Hi,
>     Although you can set datacenter layer on your network topology, it is
> never enabled in hadoop as lacking of replica placement and task scheduling
> support. There are some work to add layers other than rack and node under
> HADOOP-8848 but may not suit for your case. Agree with Adam that a cluster
> spanning multiple data centers seems not make sense even for DR case. Do
> you have other cases to do such a deployment?
>
> Thanks,
>
> Junping
>
> ------------------------------
> *From: *"Adam Muise" <am...@hortonworks.com>
> *To: *user@hadoop.apache.org
> *Sent: *Friday, August 30, 2013 6:26:54 PM
> *Subject: *Re: Multidata center support
>
>
> Nothing has changed. DR best practice is still one (or more) clusters per
> site and replication is handled via distributed copy or some variation of
> it. A cluster spanning multiple data centers is a poor idea right now.
>
>
>
>
> On Fri, Aug 30, 2013 at 12:35 AM, Rahul Bhattacharjee <
> rahul.rec.dgp@gmail.com> wrote:
>
>> My take on this.
>>
>> Why hadoop has to know about data center thing. I think it can be
>> installed across multiple data centers , however topology configuration
>> would be required to tell which node belongs to which data center and
>> switch for block placement.
>>
>> Thanks,
>> Rahul
>>
>>
>> On Fri, Aug 30, 2013 at 12:42 AM, Baskar Duraikannu <
>> baskar.duraikannu@outlook.com> wrote:
>>
>>> We have a need to setup hadoop across data centers.  Does hadoop support
>>> multi data center configuration? I searched through archives and have found
>>> that hadoop did not support multi data center configuration some time back.
>>> Just wanted to see whether situation has changed.
>>>
>>> Please help.
>>>
>>
>>
>
>
> --
> *
> *
> *
> *
> *Adam Muise*
> Solution Engineer
> *Hortonworks*
> amuise@hortonworks.com
> 416-417-4037
>
> Hortonworks - Develops, Distributes and Supports Enterprise Apache Hadoop.<http://hortonworks.com/>
>
> Hortonworks Virtual Sandbox <http://hortonworks.com/sandbox>
>
> Hadoop: Disruptive Possibilities by Jeff Needham<http://hortonworks.com/resources/?did=72&cat=1>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>
>

Re: Multidata center support

Posted by Visioner Sadak <vi...@gmail.com>.
lets say that

you have some machines in europe and some  in US I think you just need the
ips and configure them in your cluster set up
it will work...


On Sat, Aug 31, 2013 at 7:52 AM, Jun Ping Du <jd...@vmware.com> wrote:

> Hi,
>     Although you can set datacenter layer on your network topology, it is
> never enabled in hadoop as lacking of replica placement and task scheduling
> support. There are some work to add layers other than rack and node under
> HADOOP-8848 but may not suit for your case. Agree with Adam that a cluster
> spanning multiple data centers seems not make sense even for DR case. Do
> you have other cases to do such a deployment?
>
> Thanks,
>
> Junping
>
> ------------------------------
> *From: *"Adam Muise" <am...@hortonworks.com>
> *To: *user@hadoop.apache.org
> *Sent: *Friday, August 30, 2013 6:26:54 PM
> *Subject: *Re: Multidata center support
>
>
> Nothing has changed. DR best practice is still one (or more) clusters per
> site and replication is handled via distributed copy or some variation of
> it. A cluster spanning multiple data centers is a poor idea right now.
>
>
>
>
> On Fri, Aug 30, 2013 at 12:35 AM, Rahul Bhattacharjee <
> rahul.rec.dgp@gmail.com> wrote:
>
>> My take on this.
>>
>> Why hadoop has to know about data center thing. I think it can be
>> installed across multiple data centers , however topology configuration
>> would be required to tell which node belongs to which data center and
>> switch for block placement.
>>
>> Thanks,
>> Rahul
>>
>>
>> On Fri, Aug 30, 2013 at 12:42 AM, Baskar Duraikannu <
>> baskar.duraikannu@outlook.com> wrote:
>>
>>> We have a need to setup hadoop across data centers.  Does hadoop support
>>> multi data center configuration? I searched through archives and have found
>>> that hadoop did not support multi data center configuration some time back.
>>> Just wanted to see whether situation has changed.
>>>
>>> Please help.
>>>
>>
>>
>
>
> --
> *
> *
> *
> *
> *Adam Muise*
> Solution Engineer
> *Hortonworks*
> amuise@hortonworks.com
> 416-417-4037
>
> Hortonworks - Develops, Distributes and Supports Enterprise Apache Hadoop.<http://hortonworks.com/>
>
> Hortonworks Virtual Sandbox <http://hortonworks.com/sandbox>
>
> Hadoop: Disruptive Possibilities by Jeff Needham<http://hortonworks.com/resources/?did=72&cat=1>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>
>

Re: Multidata center support

Posted by Visioner Sadak <vi...@gmail.com>.
lets say that

you have some machines in europe and some  in US I think you just need the
ips and configure them in your cluster set up
it will work...


On Sat, Aug 31, 2013 at 7:52 AM, Jun Ping Du <jd...@vmware.com> wrote:

> Hi,
>     Although you can set datacenter layer on your network topology, it is
> never enabled in hadoop as lacking of replica placement and task scheduling
> support. There are some work to add layers other than rack and node under
> HADOOP-8848 but may not suit for your case. Agree with Adam that a cluster
> spanning multiple data centers seems not make sense even for DR case. Do
> you have other cases to do such a deployment?
>
> Thanks,
>
> Junping
>
> ------------------------------
> *From: *"Adam Muise" <am...@hortonworks.com>
> *To: *user@hadoop.apache.org
> *Sent: *Friday, August 30, 2013 6:26:54 PM
> *Subject: *Re: Multidata center support
>
>
> Nothing has changed. DR best practice is still one (or more) clusters per
> site and replication is handled via distributed copy or some variation of
> it. A cluster spanning multiple data centers is a poor idea right now.
>
>
>
>
> On Fri, Aug 30, 2013 at 12:35 AM, Rahul Bhattacharjee <
> rahul.rec.dgp@gmail.com> wrote:
>
>> My take on this.
>>
>> Why hadoop has to know about data center thing. I think it can be
>> installed across multiple data centers , however topology configuration
>> would be required to tell which node belongs to which data center and
>> switch for block placement.
>>
>> Thanks,
>> Rahul
>>
>>
>> On Fri, Aug 30, 2013 at 12:42 AM, Baskar Duraikannu <
>> baskar.duraikannu@outlook.com> wrote:
>>
>>> We have a need to setup hadoop across data centers.  Does hadoop support
>>> multi data center configuration? I searched through archives and have found
>>> that hadoop did not support multi data center configuration some time back.
>>> Just wanted to see whether situation has changed.
>>>
>>> Please help.
>>>
>>
>>
>
>
> --
> *
> *
> *
> *
> *Adam Muise*
> Solution Engineer
> *Hortonworks*
> amuise@hortonworks.com
> 416-417-4037
>
> Hortonworks - Develops, Distributes and Supports Enterprise Apache Hadoop.<http://hortonworks.com/>
>
> Hortonworks Virtual Sandbox <http://hortonworks.com/sandbox>
>
> Hadoop: Disruptive Possibilities by Jeff Needham<http://hortonworks.com/resources/?did=72&cat=1>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>
>

Re: Multidata center support

Posted by Jun Ping Du <jd...@vmware.com>.
Hi, 
Although you can set datacenter layer on your network topology, it is never enabled in hadoop as lacking of replica placement and task scheduling support. There are some work to add layers other than rack and node under HADOOP-8848 but may not suit for your case. Agree with Adam that a cluster spanning multiple data centers seems not make sense even for DR case. Do you have other cases to do such a deployment? 

Thanks, 

Junping 

----- Original Message -----

From: "Adam Muise" <am...@hortonworks.com> 
To: user@hadoop.apache.org 
Sent: Friday, August 30, 2013 6:26:54 PM 
Subject: Re: Multidata center support 

Nothing has changed. DR best practice is still one (or more) clusters per site and replication is handled via distributed copy or some variation of it. A cluster spanning multiple data centers is a poor idea right now. 




On Fri, Aug 30, 2013 at 12:35 AM, Rahul Bhattacharjee < rahul.rec.dgp@gmail.com > wrote: 



My take on this. 

Why hadoop has to know about data center thing. I think it can be installed across multiple data centers , however topology configuration would be required to tell which node belongs to which data center and switch for block placement. 

Thanks, 
Rahul 


On Fri, Aug 30, 2013 at 12:42 AM, Baskar Duraikannu < baskar.duraikannu@outlook.com > wrote: 

<blockquote>

We have a need to setup hadoop across data centers. Does hadoop support multi data center configuration? I searched through archives and have found that hadoop did not support multi data center configuration some time back. Just wanted to see whether situation has changed. 

Please help. 





</blockquote>




-- 


Adam Muise 
Solution Engineer 
Hortonworks 
amuise@hortonworks.com 
416-417-4037 

Hortonworks - Develops, Distributes and Supports Enterprise Apache Hadoop. 

Hortonworks Virtual Sandbox 

Hadoop: Disruptive Possibilities by Jeff Needham 

CONFIDENTIALITY NOTICE 
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. 


RE: Multidata center support

Posted by Baskar Duraikannu <ba...@outlook.com>.
Thanks Mike. I am assuming that it is a poor idea due to network bandwidth constraints across data center (backplane speed of TOR is typically greater than data center connectivity). 
From: michael_segel@hotmail.com
Subject: Re: Multidata center support
Date: Wed, 4 Sep 2013 20:15:08 -0500
To: user@hadoop.apache.org

Sorry, its a poor idea period. 
Its one thing for something like Cleversafe to span a data center, but you're also having unit of work in terms of map/reduce. 
Think about all of the bad things that can happen when you have to deal with a sort/shuffle stage across data centers... (Its not a pretty sight.) 
As Adam points out... DR and copies across data centers are one thing. Running a single cluster spanning data centers...
I would hate to be you when you have to face your devOps team. Does the expression BOFH ring a bell? ;-) 
HTH
-Mike
On Aug 30, 2013, at 5:26 AM, Adam Muise <am...@hortonworks.com> wrote:Nothing has changed. DR best practice is still one (or more) clusters per site and replication is handled via distributed copy or some variation of it. A cluster spanning multiple data centers is a poor idea right now.




On Fri, Aug 30, 2013 at 12:35 AM, Rahul Bhattacharjee <ra...@gmail.com> wrote:

My take on this.


Why hadoop has to know about data center thing. I think it can be installed across multiple data centers , however topology configuration would be required to tell which node belongs to which data center and switch for block placement.




Thanks,
Rahul


On Fri, Aug 30, 2013 at 12:42 AM, Baskar Duraikannu <ba...@outlook.com> wrote:






We have a need to setup hadoop across data centers.  Does hadoop support multi data center configuration? I searched through archives and have found that hadoop did not support multi data center configuration some time back. Just wanted to see whether situation has changed.



Please help. 		 	   		  




-- 



Adam MuiseSolution EngineerHortonworks
amuise@hortonworks.com416-417-4037

Hortonworks - Develops, Distributes and Supports Enterprise Apache Hadoop.


Hortonworks Virtual Sandbox


Hadoop: Disruptive Possibilities by Jeff Needham





CONFIDENTIALITY NOTICENOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental. Use at your own risk. Michael Segelmichael_segel (AT) hotmail.com






 		 	   		  

RE: Multidata center support

Posted by Baskar Duraikannu <ba...@outlook.com>.
Thanks Mike. I am assuming that it is a poor idea due to network bandwidth constraints across data center (backplane speed of TOR is typically greater than data center connectivity). 
From: michael_segel@hotmail.com
Subject: Re: Multidata center support
Date: Wed, 4 Sep 2013 20:15:08 -0500
To: user@hadoop.apache.org

Sorry, its a poor idea period. 
Its one thing for something like Cleversafe to span a data center, but you're also having unit of work in terms of map/reduce. 
Think about all of the bad things that can happen when you have to deal with a sort/shuffle stage across data centers... (Its not a pretty sight.) 
As Adam points out... DR and copies across data centers are one thing. Running a single cluster spanning data centers...
I would hate to be you when you have to face your devOps team. Does the expression BOFH ring a bell? ;-) 
HTH
-Mike
On Aug 30, 2013, at 5:26 AM, Adam Muise <am...@hortonworks.com> wrote:Nothing has changed. DR best practice is still one (or more) clusters per site and replication is handled via distributed copy or some variation of it. A cluster spanning multiple data centers is a poor idea right now.




On Fri, Aug 30, 2013 at 12:35 AM, Rahul Bhattacharjee <ra...@gmail.com> wrote:

My take on this.


Why hadoop has to know about data center thing. I think it can be installed across multiple data centers , however topology configuration would be required to tell which node belongs to which data center and switch for block placement.




Thanks,
Rahul


On Fri, Aug 30, 2013 at 12:42 AM, Baskar Duraikannu <ba...@outlook.com> wrote:






We have a need to setup hadoop across data centers.  Does hadoop support multi data center configuration? I searched through archives and have found that hadoop did not support multi data center configuration some time back. Just wanted to see whether situation has changed.



Please help. 		 	   		  




-- 



Adam MuiseSolution EngineerHortonworks
amuise@hortonworks.com416-417-4037

Hortonworks - Develops, Distributes and Supports Enterprise Apache Hadoop.


Hortonworks Virtual Sandbox


Hadoop: Disruptive Possibilities by Jeff Needham





CONFIDENTIALITY NOTICENOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental. Use at your own risk. Michael Segelmichael_segel (AT) hotmail.com






 		 	   		  

RE: Multidata center support

Posted by Baskar Duraikannu <ba...@outlook.com>.
Thanks Mike. I am assuming that it is a poor idea due to network bandwidth constraints across data center (backplane speed of TOR is typically greater than data center connectivity). 
From: michael_segel@hotmail.com
Subject: Re: Multidata center support
Date: Wed, 4 Sep 2013 20:15:08 -0500
To: user@hadoop.apache.org

Sorry, its a poor idea period. 
Its one thing for something like Cleversafe to span a data center, but you're also having unit of work in terms of map/reduce. 
Think about all of the bad things that can happen when you have to deal with a sort/shuffle stage across data centers... (Its not a pretty sight.) 
As Adam points out... DR and copies across data centers are one thing. Running a single cluster spanning data centers...
I would hate to be you when you have to face your devOps team. Does the expression BOFH ring a bell? ;-) 
HTH
-Mike
On Aug 30, 2013, at 5:26 AM, Adam Muise <am...@hortonworks.com> wrote:Nothing has changed. DR best practice is still one (or more) clusters per site and replication is handled via distributed copy or some variation of it. A cluster spanning multiple data centers is a poor idea right now.




On Fri, Aug 30, 2013 at 12:35 AM, Rahul Bhattacharjee <ra...@gmail.com> wrote:

My take on this.


Why hadoop has to know about data center thing. I think it can be installed across multiple data centers , however topology configuration would be required to tell which node belongs to which data center and switch for block placement.




Thanks,
Rahul


On Fri, Aug 30, 2013 at 12:42 AM, Baskar Duraikannu <ba...@outlook.com> wrote:






We have a need to setup hadoop across data centers.  Does hadoop support multi data center configuration? I searched through archives and have found that hadoop did not support multi data center configuration some time back. Just wanted to see whether situation has changed.



Please help. 		 	   		  




-- 



Adam MuiseSolution EngineerHortonworks
amuise@hortonworks.com416-417-4037

Hortonworks - Develops, Distributes and Supports Enterprise Apache Hadoop.


Hortonworks Virtual Sandbox


Hadoop: Disruptive Possibilities by Jeff Needham





CONFIDENTIALITY NOTICENOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental. Use at your own risk. Michael Segelmichael_segel (AT) hotmail.com






 		 	   		  

RE: Multidata center support

Posted by Baskar Duraikannu <ba...@outlook.com>.
Thanks Mike. I am assuming that it is a poor idea due to network bandwidth constraints across data center (backplane speed of TOR is typically greater than data center connectivity). 
From: michael_segel@hotmail.com
Subject: Re: Multidata center support
Date: Wed, 4 Sep 2013 20:15:08 -0500
To: user@hadoop.apache.org

Sorry, its a poor idea period. 
Its one thing for something like Cleversafe to span a data center, but you're also having unit of work in terms of map/reduce. 
Think about all of the bad things that can happen when you have to deal with a sort/shuffle stage across data centers... (Its not a pretty sight.) 
As Adam points out... DR and copies across data centers are one thing. Running a single cluster spanning data centers...
I would hate to be you when you have to face your devOps team. Does the expression BOFH ring a bell? ;-) 
HTH
-Mike
On Aug 30, 2013, at 5:26 AM, Adam Muise <am...@hortonworks.com> wrote:Nothing has changed. DR best practice is still one (or more) clusters per site and replication is handled via distributed copy or some variation of it. A cluster spanning multiple data centers is a poor idea right now.




On Fri, Aug 30, 2013 at 12:35 AM, Rahul Bhattacharjee <ra...@gmail.com> wrote:

My take on this.


Why hadoop has to know about data center thing. I think it can be installed across multiple data centers , however topology configuration would be required to tell which node belongs to which data center and switch for block placement.




Thanks,
Rahul


On Fri, Aug 30, 2013 at 12:42 AM, Baskar Duraikannu <ba...@outlook.com> wrote:






We have a need to setup hadoop across data centers.  Does hadoop support multi data center configuration? I searched through archives and have found that hadoop did not support multi data center configuration some time back. Just wanted to see whether situation has changed.



Please help. 		 	   		  




-- 



Adam MuiseSolution EngineerHortonworks
amuise@hortonworks.com416-417-4037

Hortonworks - Develops, Distributes and Supports Enterprise Apache Hadoop.


Hortonworks Virtual Sandbox


Hadoop: Disruptive Possibilities by Jeff Needham





CONFIDENTIALITY NOTICENOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental. Use at your own risk. Michael Segelmichael_segel (AT) hotmail.com






 		 	   		  

Re: Multidata center support

Posted by Michael Segel <mi...@hotmail.com>.
Sorry, its a poor idea period. 

Its one thing for something like Cleversafe to span a data center, but you're also having unit of work in terms of map/reduce. 

Think about all of the bad things that can happen when you have to deal with a sort/shuffle stage across data centers... 
(Its not a pretty sight.) 

As Adam points out... DR and copies across data centers are one thing. 
Running a single cluster spanning data centers...

I would hate to be you when you have to face your devOps team. Does the expression BOFH ring a bell? ;-) 

HTH

-Mike

On Aug 30, 2013, at 5:26 AM, Adam Muise <am...@hortonworks.com> wrote:

> Nothing has changed. DR best practice is still one (or more) clusters per site and replication is handled via distributed copy or some variation of it. A cluster spanning multiple data centers is a poor idea right now.
> 
> 
> 
> 
> On Fri, Aug 30, 2013 at 12:35 AM, Rahul Bhattacharjee <ra...@gmail.com> wrote:
> My take on this.
> 
> Why hadoop has to know about data center thing. I think it can be installed across multiple data centers , however topology configuration would be required to tell which node belongs to which data center and switch for block placement.
> 
> Thanks,
> Rahul
> 
> 
> On Fri, Aug 30, 2013 at 12:42 AM, Baskar Duraikannu <ba...@outlook.com> wrote:
> We have a need to setup hadoop across data centers.  Does hadoop support multi data center configuration? I searched through archives and have found that hadoop did not support multi data center configuration some time back. Just wanted to see whether situation has changed.
> 
> Please help.
> 
> 
> 
> 
> -- 
> 
> 
> Adam Muise
> Solution Engineer
> Hortonworks
> amuise@hortonworks.com
> 416-417-4037
> 
> Hortonworks - Develops, Distributes and Supports Enterprise Apache Hadoop.
> 
> Hortonworks Virtual Sandbox
> 
> Hadoop: Disruptive Possibilities by Jeff Needham
> 
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com






Re: Multidata center support

Posted by Michael Segel <mi...@hotmail.com>.
Sorry, its a poor idea period. 

Its one thing for something like Cleversafe to span a data center, but you're also having unit of work in terms of map/reduce. 

Think about all of the bad things that can happen when you have to deal with a sort/shuffle stage across data centers... 
(Its not a pretty sight.) 

As Adam points out... DR and copies across data centers are one thing. 
Running a single cluster spanning data centers...

I would hate to be you when you have to face your devOps team. Does the expression BOFH ring a bell? ;-) 

HTH

-Mike

On Aug 30, 2013, at 5:26 AM, Adam Muise <am...@hortonworks.com> wrote:

> Nothing has changed. DR best practice is still one (or more) clusters per site and replication is handled via distributed copy or some variation of it. A cluster spanning multiple data centers is a poor idea right now.
> 
> 
> 
> 
> On Fri, Aug 30, 2013 at 12:35 AM, Rahul Bhattacharjee <ra...@gmail.com> wrote:
> My take on this.
> 
> Why hadoop has to know about data center thing. I think it can be installed across multiple data centers , however topology configuration would be required to tell which node belongs to which data center and switch for block placement.
> 
> Thanks,
> Rahul
> 
> 
> On Fri, Aug 30, 2013 at 12:42 AM, Baskar Duraikannu <ba...@outlook.com> wrote:
> We have a need to setup hadoop across data centers.  Does hadoop support multi data center configuration? I searched through archives and have found that hadoop did not support multi data center configuration some time back. Just wanted to see whether situation has changed.
> 
> Please help.
> 
> 
> 
> 
> -- 
> 
> 
> Adam Muise
> Solution Engineer
> Hortonworks
> amuise@hortonworks.com
> 416-417-4037
> 
> Hortonworks - Develops, Distributes and Supports Enterprise Apache Hadoop.
> 
> Hortonworks Virtual Sandbox
> 
> Hadoop: Disruptive Possibilities by Jeff Needham
> 
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com






Re: Multidata center support

Posted by Jun Ping Du <jd...@vmware.com>.
Hi, 
Although you can set datacenter layer on your network topology, it is never enabled in hadoop as lacking of replica placement and task scheduling support. There are some work to add layers other than rack and node under HADOOP-8848 but may not suit for your case. Agree with Adam that a cluster spanning multiple data centers seems not make sense even for DR case. Do you have other cases to do such a deployment? 

Thanks, 

Junping 

----- Original Message -----

From: "Adam Muise" <am...@hortonworks.com> 
To: user@hadoop.apache.org 
Sent: Friday, August 30, 2013 6:26:54 PM 
Subject: Re: Multidata center support 

Nothing has changed. DR best practice is still one (or more) clusters per site and replication is handled via distributed copy or some variation of it. A cluster spanning multiple data centers is a poor idea right now. 




On Fri, Aug 30, 2013 at 12:35 AM, Rahul Bhattacharjee < rahul.rec.dgp@gmail.com > wrote: 



My take on this. 

Why hadoop has to know about data center thing. I think it can be installed across multiple data centers , however topology configuration would be required to tell which node belongs to which data center and switch for block placement. 

Thanks, 
Rahul 


On Fri, Aug 30, 2013 at 12:42 AM, Baskar Duraikannu < baskar.duraikannu@outlook.com > wrote: 

<blockquote>

We have a need to setup hadoop across data centers. Does hadoop support multi data center configuration? I searched through archives and have found that hadoop did not support multi data center configuration some time back. Just wanted to see whether situation has changed. 

Please help. 





</blockquote>




-- 


Adam Muise 
Solution Engineer 
Hortonworks 
amuise@hortonworks.com 
416-417-4037 

Hortonworks - Develops, Distributes and Supports Enterprise Apache Hadoop. 

Hortonworks Virtual Sandbox 

Hadoop: Disruptive Possibilities by Jeff Needham 

CONFIDENTIALITY NOTICE 
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. 


Re: Multidata center support

Posted by Jun Ping Du <jd...@vmware.com>.
Hi, 
Although you can set datacenter layer on your network topology, it is never enabled in hadoop as lacking of replica placement and task scheduling support. There are some work to add layers other than rack and node under HADOOP-8848 but may not suit for your case. Agree with Adam that a cluster spanning multiple data centers seems not make sense even for DR case. Do you have other cases to do such a deployment? 

Thanks, 

Junping 

----- Original Message -----

From: "Adam Muise" <am...@hortonworks.com> 
To: user@hadoop.apache.org 
Sent: Friday, August 30, 2013 6:26:54 PM 
Subject: Re: Multidata center support 

Nothing has changed. DR best practice is still one (or more) clusters per site and replication is handled via distributed copy or some variation of it. A cluster spanning multiple data centers is a poor idea right now. 




On Fri, Aug 30, 2013 at 12:35 AM, Rahul Bhattacharjee < rahul.rec.dgp@gmail.com > wrote: 



My take on this. 

Why hadoop has to know about data center thing. I think it can be installed across multiple data centers , however topology configuration would be required to tell which node belongs to which data center and switch for block placement. 

Thanks, 
Rahul 


On Fri, Aug 30, 2013 at 12:42 AM, Baskar Duraikannu < baskar.duraikannu@outlook.com > wrote: 

<blockquote>

We have a need to setup hadoop across data centers. Does hadoop support multi data center configuration? I searched through archives and have found that hadoop did not support multi data center configuration some time back. Just wanted to see whether situation has changed. 

Please help. 





</blockquote>




-- 


Adam Muise 
Solution Engineer 
Hortonworks 
amuise@hortonworks.com 
416-417-4037 

Hortonworks - Develops, Distributes and Supports Enterprise Apache Hadoop. 

Hortonworks Virtual Sandbox 

Hadoop: Disruptive Possibilities by Jeff Needham 

CONFIDENTIALITY NOTICE 
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. 


Re: Multidata center support

Posted by Michael Segel <mi...@hotmail.com>.
Sorry, its a poor idea period. 

Its one thing for something like Cleversafe to span a data center, but you're also having unit of work in terms of map/reduce. 

Think about all of the bad things that can happen when you have to deal with a sort/shuffle stage across data centers... 
(Its not a pretty sight.) 

As Adam points out... DR and copies across data centers are one thing. 
Running a single cluster spanning data centers...

I would hate to be you when you have to face your devOps team. Does the expression BOFH ring a bell? ;-) 

HTH

-Mike

On Aug 30, 2013, at 5:26 AM, Adam Muise <am...@hortonworks.com> wrote:

> Nothing has changed. DR best practice is still one (or more) clusters per site and replication is handled via distributed copy or some variation of it. A cluster spanning multiple data centers is a poor idea right now.
> 
> 
> 
> 
> On Fri, Aug 30, 2013 at 12:35 AM, Rahul Bhattacharjee <ra...@gmail.com> wrote:
> My take on this.
> 
> Why hadoop has to know about data center thing. I think it can be installed across multiple data centers , however topology configuration would be required to tell which node belongs to which data center and switch for block placement.
> 
> Thanks,
> Rahul
> 
> 
> On Fri, Aug 30, 2013 at 12:42 AM, Baskar Duraikannu <ba...@outlook.com> wrote:
> We have a need to setup hadoop across data centers.  Does hadoop support multi data center configuration? I searched through archives and have found that hadoop did not support multi data center configuration some time back. Just wanted to see whether situation has changed.
> 
> Please help.
> 
> 
> 
> 
> -- 
> 
> 
> Adam Muise
> Solution Engineer
> Hortonworks
> amuise@hortonworks.com
> 416-417-4037
> 
> Hortonworks - Develops, Distributes and Supports Enterprise Apache Hadoop.
> 
> Hortonworks Virtual Sandbox
> 
> Hadoop: Disruptive Possibilities by Jeff Needham
> 
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com






Re: Multidata center support

Posted by Jun Ping Du <jd...@vmware.com>.
Hi, 
Although you can set datacenter layer on your network topology, it is never enabled in hadoop as lacking of replica placement and task scheduling support. There are some work to add layers other than rack and node under HADOOP-8848 but may not suit for your case. Agree with Adam that a cluster spanning multiple data centers seems not make sense even for DR case. Do you have other cases to do such a deployment? 

Thanks, 

Junping 

----- Original Message -----

From: "Adam Muise" <am...@hortonworks.com> 
To: user@hadoop.apache.org 
Sent: Friday, August 30, 2013 6:26:54 PM 
Subject: Re: Multidata center support 

Nothing has changed. DR best practice is still one (or more) clusters per site and replication is handled via distributed copy or some variation of it. A cluster spanning multiple data centers is a poor idea right now. 




On Fri, Aug 30, 2013 at 12:35 AM, Rahul Bhattacharjee < rahul.rec.dgp@gmail.com > wrote: 



My take on this. 

Why hadoop has to know about data center thing. I think it can be installed across multiple data centers , however topology configuration would be required to tell which node belongs to which data center and switch for block placement. 

Thanks, 
Rahul 


On Fri, Aug 30, 2013 at 12:42 AM, Baskar Duraikannu < baskar.duraikannu@outlook.com > wrote: 

<blockquote>

We have a need to setup hadoop across data centers. Does hadoop support multi data center configuration? I searched through archives and have found that hadoop did not support multi data center configuration some time back. Just wanted to see whether situation has changed. 

Please help. 





</blockquote>




-- 


Adam Muise 
Solution Engineer 
Hortonworks 
amuise@hortonworks.com 
416-417-4037 

Hortonworks - Develops, Distributes and Supports Enterprise Apache Hadoop. 

Hortonworks Virtual Sandbox 

Hadoop: Disruptive Possibilities by Jeff Needham 

CONFIDENTIALITY NOTICE 
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. 


Re: Multidata center support

Posted by Michael Segel <mi...@hotmail.com>.
Sorry, its a poor idea period. 

Its one thing for something like Cleversafe to span a data center, but you're also having unit of work in terms of map/reduce. 

Think about all of the bad things that can happen when you have to deal with a sort/shuffle stage across data centers... 
(Its not a pretty sight.) 

As Adam points out... DR and copies across data centers are one thing. 
Running a single cluster spanning data centers...

I would hate to be you when you have to face your devOps team. Does the expression BOFH ring a bell? ;-) 

HTH

-Mike

On Aug 30, 2013, at 5:26 AM, Adam Muise <am...@hortonworks.com> wrote:

> Nothing has changed. DR best practice is still one (or more) clusters per site and replication is handled via distributed copy or some variation of it. A cluster spanning multiple data centers is a poor idea right now.
> 
> 
> 
> 
> On Fri, Aug 30, 2013 at 12:35 AM, Rahul Bhattacharjee <ra...@gmail.com> wrote:
> My take on this.
> 
> Why hadoop has to know about data center thing. I think it can be installed across multiple data centers , however topology configuration would be required to tell which node belongs to which data center and switch for block placement.
> 
> Thanks,
> Rahul
> 
> 
> On Fri, Aug 30, 2013 at 12:42 AM, Baskar Duraikannu <ba...@outlook.com> wrote:
> We have a need to setup hadoop across data centers.  Does hadoop support multi data center configuration? I searched through archives and have found that hadoop did not support multi data center configuration some time back. Just wanted to see whether situation has changed.
> 
> Please help.
> 
> 
> 
> 
> -- 
> 
> 
> Adam Muise
> Solution Engineer
> Hortonworks
> amuise@hortonworks.com
> 416-417-4037
> 
> Hortonworks - Develops, Distributes and Supports Enterprise Apache Hadoop.
> 
> Hortonworks Virtual Sandbox
> 
> Hadoop: Disruptive Possibilities by Jeff Needham
> 
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com






Re: Multidata center support

Posted by Adam Muise <am...@hortonworks.com>.
Nothing has changed. DR best practice is still one (or more) clusters per
site and replication is handled via distributed copy or some variation of
it. A cluster spanning multiple data centers is a poor idea right now.




On Fri, Aug 30, 2013 at 12:35 AM, Rahul Bhattacharjee <
rahul.rec.dgp@gmail.com> wrote:

> My take on this.
>
> Why hadoop has to know about data center thing. I think it can be
> installed across multiple data centers , however topology configuration
> would be required to tell which node belongs to which data center and
> switch for block placement.
>
> Thanks,
> Rahul
>
>
> On Fri, Aug 30, 2013 at 12:42 AM, Baskar Duraikannu <
> baskar.duraikannu@outlook.com> wrote:
>
>> We have a need to setup hadoop across data centers.  Does hadoop support
>> multi data center configuration? I searched through archives and have found
>> that hadoop did not support multi data center configuration some time back.
>> Just wanted to see whether situation has changed.
>>
>> Please help.
>>
>
>


-- 
*
*
*
*
*Adam Muise*
Solution Engineer
*Hortonworks*
amuise@hortonworks.com
416-417-4037

Hortonworks - Develops, Distributes and Supports Enterprise Apache
Hadoop.<http://hortonworks.com/>

Hortonworks Virtual Sandbox <http://hortonworks.com/sandbox>

Hadoop: Disruptive Possibilities by Jeff
Needham<http://hortonworks.com/resources/?did=72&cat=1>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Multidata center support

Posted by Adam Muise <am...@hortonworks.com>.
Nothing has changed. DR best practice is still one (or more) clusters per
site and replication is handled via distributed copy or some variation of
it. A cluster spanning multiple data centers is a poor idea right now.




On Fri, Aug 30, 2013 at 12:35 AM, Rahul Bhattacharjee <
rahul.rec.dgp@gmail.com> wrote:

> My take on this.
>
> Why hadoop has to know about data center thing. I think it can be
> installed across multiple data centers , however topology configuration
> would be required to tell which node belongs to which data center and
> switch for block placement.
>
> Thanks,
> Rahul
>
>
> On Fri, Aug 30, 2013 at 12:42 AM, Baskar Duraikannu <
> baskar.duraikannu@outlook.com> wrote:
>
>> We have a need to setup hadoop across data centers.  Does hadoop support
>> multi data center configuration? I searched through archives and have found
>> that hadoop did not support multi data center configuration some time back.
>> Just wanted to see whether situation has changed.
>>
>> Please help.
>>
>
>


-- 
*
*
*
*
*Adam Muise*
Solution Engineer
*Hortonworks*
amuise@hortonworks.com
416-417-4037

Hortonworks - Develops, Distributes and Supports Enterprise Apache
Hadoop.<http://hortonworks.com/>

Hortonworks Virtual Sandbox <http://hortonworks.com/sandbox>

Hadoop: Disruptive Possibilities by Jeff
Needham<http://hortonworks.com/resources/?did=72&cat=1>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Multidata center support

Posted by Adam Muise <am...@hortonworks.com>.
Nothing has changed. DR best practice is still one (or more) clusters per
site and replication is handled via distributed copy or some variation of
it. A cluster spanning multiple data centers is a poor idea right now.




On Fri, Aug 30, 2013 at 12:35 AM, Rahul Bhattacharjee <
rahul.rec.dgp@gmail.com> wrote:

> My take on this.
>
> Why hadoop has to know about data center thing. I think it can be
> installed across multiple data centers , however topology configuration
> would be required to tell which node belongs to which data center and
> switch for block placement.
>
> Thanks,
> Rahul
>
>
> On Fri, Aug 30, 2013 at 12:42 AM, Baskar Duraikannu <
> baskar.duraikannu@outlook.com> wrote:
>
>> We have a need to setup hadoop across data centers.  Does hadoop support
>> multi data center configuration? I searched through archives and have found
>> that hadoop did not support multi data center configuration some time back.
>> Just wanted to see whether situation has changed.
>>
>> Please help.
>>
>
>


-- 
*
*
*
*
*Adam Muise*
Solution Engineer
*Hortonworks*
amuise@hortonworks.com
416-417-4037

Hortonworks - Develops, Distributes and Supports Enterprise Apache
Hadoop.<http://hortonworks.com/>

Hortonworks Virtual Sandbox <http://hortonworks.com/sandbox>

Hadoop: Disruptive Possibilities by Jeff
Needham<http://hortonworks.com/resources/?did=72&cat=1>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Multidata center support

Posted by Adam Muise <am...@hortonworks.com>.
Nothing has changed. DR best practice is still one (or more) clusters per
site and replication is handled via distributed copy or some variation of
it. A cluster spanning multiple data centers is a poor idea right now.




On Fri, Aug 30, 2013 at 12:35 AM, Rahul Bhattacharjee <
rahul.rec.dgp@gmail.com> wrote:

> My take on this.
>
> Why hadoop has to know about data center thing. I think it can be
> installed across multiple data centers , however topology configuration
> would be required to tell which node belongs to which data center and
> switch for block placement.
>
> Thanks,
> Rahul
>
>
> On Fri, Aug 30, 2013 at 12:42 AM, Baskar Duraikannu <
> baskar.duraikannu@outlook.com> wrote:
>
>> We have a need to setup hadoop across data centers.  Does hadoop support
>> multi data center configuration? I searched through archives and have found
>> that hadoop did not support multi data center configuration some time back.
>> Just wanted to see whether situation has changed.
>>
>> Please help.
>>
>
>


-- 
*
*
*
*
*Adam Muise*
Solution Engineer
*Hortonworks*
amuise@hortonworks.com
416-417-4037

Hortonworks - Develops, Distributes and Supports Enterprise Apache
Hadoop.<http://hortonworks.com/>

Hortonworks Virtual Sandbox <http://hortonworks.com/sandbox>

Hadoop: Disruptive Possibilities by Jeff
Needham<http://hortonworks.com/resources/?did=72&cat=1>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Multidata center support

Posted by Rahul Bhattacharjee <ra...@gmail.com>.
My take on this.

Why hadoop has to know about data center thing. I think it can be installed
across multiple data centers , however topology configuration would be
required to tell which node belongs to which data center and switch for
block placement.

Thanks,
Rahul


On Fri, Aug 30, 2013 at 12:42 AM, Baskar Duraikannu <
baskar.duraikannu@outlook.com> wrote:

> We have a need to setup hadoop across data centers.  Does hadoop support
> multi data center configuration? I searched through archives and have found
> that hadoop did not support multi data center configuration some time back.
> Just wanted to see whether situation has changed.
>
> Please help.
>

Re: Multidata center support

Posted by Rahul Bhattacharjee <ra...@gmail.com>.
My take on this.

Why hadoop has to know about data center thing. I think it can be installed
across multiple data centers , however topology configuration would be
required to tell which node belongs to which data center and switch for
block placement.

Thanks,
Rahul


On Fri, Aug 30, 2013 at 12:42 AM, Baskar Duraikannu <
baskar.duraikannu@outlook.com> wrote:

> We have a need to setup hadoop across data centers.  Does hadoop support
> multi data center configuration? I searched through archives and have found
> that hadoop did not support multi data center configuration some time back.
> Just wanted to see whether situation has changed.
>
> Please help.
>

Re: Multidata center support

Posted by Rahul Bhattacharjee <ra...@gmail.com>.
My take on this.

Why hadoop has to know about data center thing. I think it can be installed
across multiple data centers , however topology configuration would be
required to tell which node belongs to which data center and switch for
block placement.

Thanks,
Rahul


On Fri, Aug 30, 2013 at 12:42 AM, Baskar Duraikannu <
baskar.duraikannu@outlook.com> wrote:

> We have a need to setup hadoop across data centers.  Does hadoop support
> multi data center configuration? I searched through archives and have found
> that hadoop did not support multi data center configuration some time back.
> Just wanted to see whether situation has changed.
>
> Please help.
>

Re: Multidata center support

Posted by Rahul Bhattacharjee <ra...@gmail.com>.
My take on this.

Why hadoop has to know about data center thing. I think it can be installed
across multiple data centers , however topology configuration would be
required to tell which node belongs to which data center and switch for
block placement.

Thanks,
Rahul


On Fri, Aug 30, 2013 at 12:42 AM, Baskar Duraikannu <
baskar.duraikannu@outlook.com> wrote:

> We have a need to setup hadoop across data centers.  Does hadoop support
> multi data center configuration? I searched through archives and have found
> that hadoop did not support multi data center configuration some time back.
> Just wanted to see whether situation has changed.
>
> Please help.
>