You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-dev@hadoop.apache.org by Scott Chen <sc...@fb.com> on 2010/12/09 20:11:57 UTC

Review Request: Raid should rearrange the replicas while raiding

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/160/
-----------------------------------------------------------

Review request for hadoop-mapreduce, Dhruba Borthakur and Ramkumar Vadali.


Summary
-------

Raided file introduce extra dependencies on the blocks on the same stripe.
Therefore we need a new way to place the blocks.
It is desirable that raided file satisfies the following two conditions:
a. Replicas on the same stripe should be on different machines (or racks)
b. Replicas of the same block should be on different racks

MAPREDUCE-1831 will try to delete the replicas on the same stripe and the same machine (a).
But in the mean time, it will try to maintain the number of distinct racks of one block (b).
We cannot satisfy (a) and (b) at the same time with the current logic in BlockPlacementPolicyDefault.chooseTarget().

One choice we have is to change BlockPlacementPolicyDefault.chooseTarget().
However, this placement is in general good for all files including the unraided ones.
It is not clear to us that we can make this good for both raided and unraided files.

So we propose this idea that when raiding the file. We create one more off-rack replica (so the replication=4 now).
Than we delete two blocks using the policy in MAPREDUCE-1831 after that (replication=2 now).
This way we can rearrange the replicas to satisfy (a) and (b) at the same time.


Diffs
-----

  trunk/src/contrib/raid/src/java/org/apache/hadoop/raid/DistRaid.java 1040840 
  trunk/src/contrib/raid/src/java/org/apache/hadoop/raid/RaidNode.java 1040840 
  trunk/src/contrib/raid/src/test/org/apache/hadoop/raid/TestRaidNode.java 1040840 

Diff: https://reviews.apache.org/r/160/diff


Testing
-------


Thanks,

Scott


Re: Review Request: Raid should rearrange the replicas while raiding

Posted by Ramkumar Vadali <ra...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/160/#review61
-----------------------------------------------------------



trunk/src/contrib/raid/src/java/org/apache/hadoop/raid/RaidNode.java
<https://reviews.apache.org/r/160/#comment43>

    Check for valid values of transitionRepl.



trunk/src/contrib/raid/src/java/org/apache/hadoop/raid/RaidNode.java
<https://reviews.apache.org/r/160/#comment44>

    Check for file not present



trunk/src/contrib/raid/src/java/org/apache/hadoop/raid/RaidNode.java
<https://reviews.apache.org/r/160/#comment45>

    This should not increase replication of the file. 


- Ramkumar


On 2010-12-09 11:11:57, Scott Chen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/160/
> -----------------------------------------------------------
> 
> (Updated 2010-12-09 11:11:57)
> 
> 
> Review request for hadoop-mapreduce, Dhruba Borthakur and Ramkumar Vadali.
> 
> 
> Summary
> -------
> 
> Raided file introduce extra dependencies on the blocks on the same stripe.
> Therefore we need a new way to place the blocks.
> It is desirable that raided file satisfies the following two conditions:
> a. Replicas on the same stripe should be on different machines (or racks)
> b. Replicas of the same block should be on different racks
> 
> MAPREDUCE-1831 will try to delete the replicas on the same stripe and the same machine (a).
> But in the mean time, it will try to maintain the number of distinct racks of one block (b).
> We cannot satisfy (a) and (b) at the same time with the current logic in BlockPlacementPolicyDefault.chooseTarget().
> 
> One choice we have is to change BlockPlacementPolicyDefault.chooseTarget().
> However, this placement is in general good for all files including the unraided ones.
> It is not clear to us that we can make this good for both raided and unraided files.
> 
> So we propose this idea that when raiding the file. We create one more off-rack replica (so the replication=4 now).
> Than we delete two blocks using the policy in MAPREDUCE-1831 after that (replication=2 now).
> This way we can rearrange the replicas to satisfy (a) and (b) at the same time.
> 
> 
> Diffs
> -----
> 
>   trunk/src/contrib/raid/src/java/org/apache/hadoop/raid/DistRaid.java 1040840 
>   trunk/src/contrib/raid/src/java/org/apache/hadoop/raid/RaidNode.java 1040840 
>   trunk/src/contrib/raid/src/test/org/apache/hadoop/raid/TestRaidNode.java 1040840 
> 
> Diff: https://reviews.apache.org/r/160/diff
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Scott
> 
>