You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Lars Francke <la...@gmail.com> on 2019/07/04 09:15:11 UTC

BlockPlacementPolicy question with hierarchical topology

Hi,

I have a customer who wants to make sure that copies of his data are
distributed amongst datacenters. So they are using rack names like this
/dc1/rack1, /dc1/rack2, /dc2/rack1 etc.

Unfortunately, the BlockPlacementPolicyDefault seems to place all blocks on
/dc1/* sometimes.

Is there a way to guarantee that /dc1/* and /dc2/* will be used in this
scenario?

Looking at chooseRandomWithStorageTypeTwoTrial it seems to consider the
full "scope" and not its components. I couldn't find anything in the code
but I had hoped I'm missing something: Is there a way to configure HDFS for
the behaviour I'd like?

Thanks!

Lars

Re: BlockPlacementPolicy question with hierarchical topology

Posted by Lars Francke <la...@gmail.com>.
Hi Takanobu,

sorry for the late reply, I missed your email.

That looks good, I'll take a look at that.

Cheers,
Lars

On Fri, Jul 5, 2019 at 6:47 AM Takanobu Asanuma <ta...@yahoo-corp.jp>
wrote:

> Hi Lars,
>
> I investigated it further. BlockPlacementPolicyWithNodeGroup may achieve
> your goal.
>
> https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyWithNodeGroup.java
>
> I don't have any experience of this placement policy, but found a use case
> slide from CACIB.
>
> https://www.slideshare.net/Hadoop_Summit/disaster-recovery-experience-at-cacib-hardening-hadoop-for-critical-financial-applications
>
> Thanks,
> - Takanobu
>
> ________________________________________
> From: Takanobu Asanuma <ta...@yahoo-corp.jp>
> Sent: Thursday, July 4, 2019 8:29:23 PM
> To: Lars Francke
> Cc: hdfs-user@hadoop.apache.org
> Subject: Re: BlockPlacementPolicy question with hierarchical topology
>
> Oh, you are right. It doesn't meet your needs. Sorry for the confusion.
> Seems it may be difficult to achive it with the existing policies.
>
> - Takanobu
>
> ________________________________________
> From: Lars Francke <la...@gmail.com>
> Sent: Thursday, July 4, 2019 7:53:35 PM
> To: 浅沼 孝信
> Cc: hdfs-user@hadoop.apache.org
> Subject: Re: BlockPlacementPolicy question with hierarchical topology
>
> Hi Takanobu,
>
> thanks for the quick reply. I missed that class.
>
> But does it really do what I need?
> If I have these racks:
> /dc1/rack1
> /dc1/rack2
> /dc1/rack3
> /dc2/rack1
> /dc2/rack2
> /dc2/rack3
>
> And I place a single block in HDFS, couldn't this policy chose /dc1/rack1,
> /dc1/rack2, /dc1/rack3 at random?
>
> Cheers,
> Lars
>
> On Thu, Jul 4, 2019 at 12:46 PM Takanobu Asanuma <tasanuma@yahoo-corp.jp
> <ma...@yahoo-corp.jp>> wrote:
> Hi Lars,
>
> I think BlockPlacementPolicyRackFaultTolerant can do it.
> This policy tries to place 3 replica separately in different racks.
>
> <property>
>   <name>dfs.block.replicator.classname</name>
>
> <value>org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyRackFaultTolerant</value>
> </property>
>
> See also:
>
> https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyRackFaultTolerant.java
>
> Thanks,
> - Takanobu
> ________________________________________
> From: Lars Francke <la...@gmail.com>>
> Sent: Thursday, July 4, 2019 18:15
> To: hdfs-user@hadoop.apache.org<ma...@hadoop.apache.org>
> Subject: BlockPlacementPolicy question with hierarchical topology
>
> Hi,
>
> I have a customer who wants to make sure that copies of his data are
> distributed amongst datacenters. So they are using rack names like this
> /dc1/rack1, /dc1/rack2, /dc2/rack1 etc.
>
> Unfortunately, the BlockPlacementPolicyDefault seems to place all blocks
> on /dc1/* sometimes.
>
> Is there a way to guarantee that /dc1/* and /dc2/* will be used in this
> scenario?
>
> Looking at chooseRandomWithStorageTypeTwoTrial it seems to consider the
> full "scope" and not its components. I couldn't find anything in the code
> but I had hoped I'm missing something: Is there a way to configure HDFS for
> the behaviour I'd like?
>
> Thanks!
>
> Lars
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: hdfs-user-unsubscribe@hadoop.apache.org
> For additional commands, e-mail: hdfs-user-help@hadoop.apache.org
>
>

Re: BlockPlacementPolicy question with hierarchical topology

Posted by Takanobu Asanuma <ta...@yahoo-corp.jp>.
Hi Lars,

I investigated it further. BlockPlacementPolicyWithNodeGroup may achieve your goal.
https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyWithNodeGroup.java

I don't have any experience of this placement policy, but found a use case slide from CACIB.
https://www.slideshare.net/Hadoop_Summit/disaster-recovery-experience-at-cacib-hardening-hadoop-for-critical-financial-applications

Thanks,
- Takanobu

________________________________________
From: Takanobu Asanuma <ta...@yahoo-corp.jp>
Sent: Thursday, July 4, 2019 8:29:23 PM
To: Lars Francke
Cc: hdfs-user@hadoop.apache.org
Subject: Re: BlockPlacementPolicy question with hierarchical topology

Oh, you are right. It doesn't meet your needs. Sorry for the confusion.
Seems it may be difficult to achive it with the existing policies.

- Takanobu

________________________________________
From: Lars Francke <la...@gmail.com>
Sent: Thursday, July 4, 2019 7:53:35 PM
To: 浅沼 孝信
Cc: hdfs-user@hadoop.apache.org
Subject: Re: BlockPlacementPolicy question with hierarchical topology

Hi Takanobu,

thanks for the quick reply. I missed that class.

But does it really do what I need?
If I have these racks:
/dc1/rack1
/dc1/rack2
/dc1/rack3
/dc2/rack1
/dc2/rack2
/dc2/rack3

And I place a single block in HDFS, couldn't this policy chose /dc1/rack1, /dc1/rack2, /dc1/rack3 at random?

Cheers,
Lars

On Thu, Jul 4, 2019 at 12:46 PM Takanobu Asanuma <ta...@yahoo-corp.jp>> wrote:
Hi Lars,

I think BlockPlacementPolicyRackFaultTolerant can do it.
This policy tries to place 3 replica separately in different racks.

<property>
  <name>dfs.block.replicator.classname</name>
  <value>org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyRackFaultTolerant</value>
</property>

See also:
https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyRackFaultTolerant.java

Thanks,
- Takanobu
________________________________________
From: Lars Francke <la...@gmail.com>>
Sent: Thursday, July 4, 2019 18:15
To: hdfs-user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: BlockPlacementPolicy question with hierarchical topology

Hi,

I have a customer who wants to make sure that copies of his data are distributed amongst datacenters. So they are using rack names like this /dc1/rack1, /dc1/rack2, /dc2/rack1 etc.

Unfortunately, the BlockPlacementPolicyDefault seems to place all blocks on /dc1/* sometimes.

Is there a way to guarantee that /dc1/* and /dc2/* will be used in this scenario?

Looking at chooseRandomWithStorageTypeTwoTrial it seems to consider the full "scope" and not its components. I couldn't find anything in the code but I had hoped I'm missing something: Is there a way to configure HDFS for the behaviour I'd like?

Thanks!

Lars

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-user-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-user-help@hadoop.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-user-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-user-help@hadoop.apache.org


Re: BlockPlacementPolicy question with hierarchical topology

Posted by Takanobu Asanuma <ta...@yahoo-corp.jp>.
Oh, you are right. It doesn't meet your needs. Sorry for the confusion.
Seems it may be difficult to achive it with the existing policies.

- Takanobu

________________________________________
From: Lars Francke <la...@gmail.com>
Sent: Thursday, July 4, 2019 7:53:35 PM
To: 浅沼 孝信
Cc: hdfs-user@hadoop.apache.org
Subject: Re: BlockPlacementPolicy question with hierarchical topology

Hi Takanobu,

thanks for the quick reply. I missed that class.

But does it really do what I need?
If I have these racks:
/dc1/rack1
/dc1/rack2
/dc1/rack3
/dc2/rack1
/dc2/rack2
/dc2/rack3

And I place a single block in HDFS, couldn't this policy chose /dc1/rack1, /dc1/rack2, /dc1/rack3 at random?

Cheers,
Lars

On Thu, Jul 4, 2019 at 12:46 PM Takanobu Asanuma <ta...@yahoo-corp.jp>> wrote:
Hi Lars,

I think BlockPlacementPolicyRackFaultTolerant can do it.
This policy tries to place 3 replica separately in different racks.

<property>
  <name>dfs.block.replicator.classname</name>
  <value>org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyRackFaultTolerant</value>
</property>

See also:
https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyRackFaultTolerant.java

Thanks,
- Takanobu
________________________________________
From: Lars Francke <la...@gmail.com>>
Sent: Thursday, July 4, 2019 18:15
To: hdfs-user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: BlockPlacementPolicy question with hierarchical topology

Hi,

I have a customer who wants to make sure that copies of his data are distributed amongst datacenters. So they are using rack names like this /dc1/rack1, /dc1/rack2, /dc2/rack1 etc.

Unfortunately, the BlockPlacementPolicyDefault seems to place all blocks on /dc1/* sometimes.

Is there a way to guarantee that /dc1/* and /dc2/* will be used in this scenario?

Looking at chooseRandomWithStorageTypeTwoTrial it seems to consider the full "scope" and not its components. I couldn't find anything in the code but I had hoped I'm missing something: Is there a way to configure HDFS for the behaviour I'd like?

Thanks!

Lars

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-user-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-user-help@hadoop.apache.org


Re: BlockPlacementPolicy question with hierarchical topology

Posted by Lars Francke <la...@gmail.com>.
Hi Takanobu,

thanks for the quick reply. I missed that class.

But does it really do what I need?
If I have these racks:
/dc1/rack1
/dc1/rack2
/dc1/rack3
/dc2/rack1
/dc2/rack2
/dc2/rack3

And I place a single block in HDFS, couldn't this policy chose /dc1/rack1,
/dc1/rack2, /dc1/rack3 at random?

Cheers,
Lars

On Thu, Jul 4, 2019 at 12:46 PM Takanobu Asanuma <ta...@yahoo-corp.jp>
wrote:

> Hi Lars,
>
> I think BlockPlacementPolicyRackFaultTolerant can do it.
> This policy tries to place 3 replica separately in different racks.
>
> <property>
>   <name>dfs.block.replicator.classname</name>
>
> <value>org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyRackFaultTolerant</value>
> </property>
>
> See also:
>
> https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyRackFaultTolerant.java
>
> Thanks,
> - Takanobu
> ________________________________________
> From: Lars Francke <la...@gmail.com>
> Sent: Thursday, July 4, 2019 18:15
> To: hdfs-user@hadoop.apache.org
> Subject: BlockPlacementPolicy question with hierarchical topology
>
> Hi,
>
> I have a customer who wants to make sure that copies of his data are
> distributed amongst datacenters. So they are using rack names like this
> /dc1/rack1, /dc1/rack2, /dc2/rack1 etc.
>
> Unfortunately, the BlockPlacementPolicyDefault seems to place all blocks
> on /dc1/* sometimes.
>
> Is there a way to guarantee that /dc1/* and /dc2/* will be used in this
> scenario?
>
> Looking at chooseRandomWithStorageTypeTwoTrial it seems to consider the
> full "scope" and not its components. I couldn't find anything in the code
> but I had hoped I'm missing something: Is there a way to configure HDFS for
> the behaviour I'd like?
>
> Thanks!
>
> Lars
>

Re: BlockPlacementPolicy question with hierarchical topology

Posted by Takanobu Asanuma <ta...@yahoo-corp.jp>.
Hi Lars,

I think BlockPlacementPolicyRackFaultTolerant can do it.
This policy tries to place 3 replica separately in different racks.

<property>
  <name>dfs.block.replicator.classname</name>
  <value>org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyRackFaultTolerant</value>
</property>

See also:
https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyRackFaultTolerant.java

Thanks,
- Takanobu
________________________________________
From: Lars Francke <la...@gmail.com>
Sent: Thursday, July 4, 2019 18:15
To: hdfs-user@hadoop.apache.org
Subject: BlockPlacementPolicy question with hierarchical topology

Hi,

I have a customer who wants to make sure that copies of his data are distributed amongst datacenters. So they are using rack names like this /dc1/rack1, /dc1/rack2, /dc2/rack1 etc.

Unfortunately, the BlockPlacementPolicyDefault seems to place all blocks on /dc1/* sometimes.

Is there a way to guarantee that /dc1/* and /dc2/* will be used in this scenario?

Looking at chooseRandomWithStorageTypeTwoTrial it seems to consider the full "scope" and not its components. I couldn't find anything in the code but I had hoped I'm missing something: Is there a way to configure HDFS for the behaviour I'd like?

Thanks!

Lars

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-user-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-user-help@hadoop.apache.org