You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by gschen <go...@yahoo.com.cn> on 2011/10/11 04:35:23 UTC

Strategy Of Replica

Hi guys,
What do you think of the strategy of replication in hdfs? How about the 
customized strategy that users customized their strategy of replication 
such as price, performance and so on?

Thank you in advance.

Re: Strategy Of Replica

Posted by Steve Loughran <st...@apache.org>.
On 11/10/11 04:49, gschen wrote:

> In hdfs only one thing we can do is that we could
> set replication factor to change replication strategy, but we can not
> change where the block is stored and what type of storage that we stored
> the data. Just think this case: In order to improve the downloading
> speed, I can choose my block replication near my location or near
> someone's location. I mean that users could have more option to decide
> their block replication strategy.

1. In "apache hadoop goes realtime at facebook", Dhruba and others 
discuss their use of alternate block placement policies.

2. Russ perry did some work on rasterization of PDF files in Hadoop 
where the final stage -collecting the output and streaming to the 
printer- was done on a machine next to the printer. He modified 
DFSClient to provide all the location data on all blocks, and had his 
app pick blocks off different machines to keep the net busy, avoid 
overloading any specific machine with disk IO requests, and to ensure 
peak bandwidth between the final destination machine

http://www.hpl.hp.com/techreports/2009/HPL-2009-345.pdf

Re: Strategy Of Replica

Posted by Mi...@emc.com.
Replication policy in HDFS is pluggable, since hadoop 0.21:

https://issues.apache.org/jira/browse/HDFS-385


- Milind


On 10/10/11 10:26 PM, "Uma Maheswara Rao G 72686" <ma...@huawei.com>
wrote:

>To get the best performance from Hadoop, we can configure network
>topology.
>Based on that, it will apply RackAwareness algorithms and write/read the
>files.
>  Also HDFS-2246 will improve performance by reading directly local
>replicas. 
>
>If you have good algorithm and will get good performance than this
>current algorithms, please file a JIRA with your proposed strategy and
>design doc. :-)
>
>Thanks & Regards,
>Uma
>
>----- Original Message -----
>From: gschen <go...@yahoo.com.cn>
>Date: Tuesday, October 11, 2011 9:26 am
>Subject: Re: Strategy Of Replica
>To: common-dev@hadoop.apache.org
>
>> On 2011/10/11 11:13, Uma Maheswara Rao G 72686 wrote:
>> > I did not get your proposed strategy implementations.
>> >
>> > Note that, already you can set the replication levels for files.
>> If you set less replication, then obviously your perf and space
>> will get benefits and also risk will be high in this case. I think
>> we can manage your requirements using that replication factor.
>> Your expectations are something different that this?
>> >
>> > Regards,
>> > Uma
>> > ----- Original Message -----
>> > From: gschen<go...@yahoo.com.cn>
>> > Date: Tuesday, October 11, 2011 8:14 am
>> > Subject: Strategy Of Replica
>> > To: "common-dev@hadoop.apache.org"<co...@hadoop.apache.org>
>> >
>> >> Hi guys,
>> >> What do you think of the strategy of replication in hdfs? How about
>> >> the
>> >> customized strategy that users customized their strategy of
>> >> replication
>> >> such as price, performance and so on?
>> >>
>> >> Thank you in advance.
>> >>
>> Thanks for your reply. In hdfs only one thing we can do is that we
>> could 
>> set replication factor to change replication  strategy, but we can
>> not 
>> change where the block is stored and what type of storage that we
>> stored 
>> the data. Just think this  case: In order to improve the
>> downloading 
>> speed, I can choose my block replication near my location or near
>> someone's location. I mean that users could have more option to
>> decide 
>> their block replication strategy.
>> 
>


Re: Strategy Of Replica

Posted by Uma Maheswara Rao G 72686 <ma...@huawei.com>.
To get the best performance from Hadoop, we can configure network topology.
Based on that, it will apply RackAwareness algorithms and write/read the files.
  Also HDFS-2246 will improve performance by reading directly local replicas. 

If you have good algorithm and will get good performance than this current algorithms, please file a JIRA with your proposed strategy and design doc. :-)

Thanks & Regards,
Uma

----- Original Message -----
From: gschen <go...@yahoo.com.cn>
Date: Tuesday, October 11, 2011 9:26 am
Subject: Re: Strategy Of Replica
To: common-dev@hadoop.apache.org

> On 2011/10/11 11:13, Uma Maheswara Rao G 72686 wrote:
> > I did not get your proposed strategy implementations.
> >
> > Note that, already you can set the replication levels for files. 
> If you set less replication, then obviously your perf and space 
> will get benefits and also risk will be high in this case. I think 
> we can manage your requirements using that replication factor. 
> Your expectations are something different that this?
> >
> > Regards,
> > Uma
> > ----- Original Message -----
> > From: gschen<go...@yahoo.com.cn>
> > Date: Tuesday, October 11, 2011 8:14 am
> > Subject: Strategy Of Replica
> > To: "common-dev@hadoop.apache.org"<co...@hadoop.apache.org>
> >
> >> Hi guys,
> >> What do you think of the strategy of replication in hdfs? How about
> >> the
> >> customized strategy that users customized their strategy of
> >> replication
> >> such as price, performance and so on?
> >>
> >> Thank you in advance.
> >>
> Thanks for your reply. In hdfs only one thing we can do is that we 
> could 
> set replication factor to change replication  strategy, but we can 
> not 
> change where the block is stored and what type of storage that we 
> stored 
> the data. Just think this  case: In order to improve the 
> downloading 
> speed, I can choose my block replication near my location or near 
> someone's location. I mean that users could have more option to 
> decide 
> their block replication strategy.
> 

Re: Strategy Of Replica

Posted by gschen <go...@yahoo.com.cn>.
On 2011/10/11 11:13, Uma Maheswara Rao G 72686 wrote:
> I did not get your proposed strategy implementations.
>
> Note that, already you can set the replication levels for files. If you set less replication, then obviously your perf and space will get benefits and also risk will be high in this case. I think we can manage your requirements using that replication factor. Your expectations are something different that this?
>
> Regards,
> Uma
> ----- Original Message -----
> From: gschen<go...@yahoo.com.cn>
> Date: Tuesday, October 11, 2011 8:14 am
> Subject: Strategy Of Replica
> To: "common-dev@hadoop.apache.org"<co...@hadoop.apache.org>
>
>> Hi guys,
>> What do you think of the strategy of replication in hdfs? How about
>> the
>> customized strategy that users customized their strategy of
>> replication
>> such as price, performance and so on?
>>
>> Thank you in advance.
>>
Thanks for your reply. In hdfs only one thing we can do is that we could 
set replication factor to change replication  strategy, but we can not 
change where the block is stored and what type of storage that we stored 
the data. Just think this  case: In order to improve the downloading 
speed, I can choose my block replication near my location or near 
someone's location. I mean that users could have more option to decide 
their block replication strategy.

Re: Strategy Of Replica

Posted by Uma Maheswara Rao G 72686 <ma...@huawei.com>.
I did not get your proposed strategy implementations.

Note that, already you can set the replication levels for files. If you set less replication, then obviously your perf and space will get benefits and also risk will be high in this case. I think we can manage your requirements using that replication factor. Your expectations are something different that this?

Regards,
Uma
----- Original Message -----
From: gschen <go...@yahoo.com.cn>
Date: Tuesday, October 11, 2011 8:14 am
Subject: Strategy Of Replica
To: "common-dev@hadoop.apache.org" <co...@hadoop.apache.org>

> Hi guys,
> What do you think of the strategy of replication in hdfs? How about 
> the 
> customized strategy that users customized their strategy of 
> replication 
> such as price, performance and so on?
> 
> Thank you in advance.
>