You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Vinayakumar B <vi...@huawei.com> on 2016/02/16 09:04:13 UTC

[Important] What is the practical maximum HDFS blocksize used in clusters?

Hi All,

Just wanted to know, what is the maximum and practical dfs.block.size used in production/test clusters.

  Current default value is 128MB and it can support upto 128TB ( Yup, right. It's just a configuration value though)

   I have seen clusters using upto 1G block size for big files.

   Is there anyone using >2GB for block size?

  This is just to check, whether any compatibility issue arises if we reduce the max supported blocksize to 32GB ( to be safer side ).

-vinay

Re: [Important] What is the practical maximum HDFS blocksize used in clusters?

Posted by Kihwal Lee <ki...@yahoo-inc.com.INVALID>.

There can be issues coming from the assumption that normal block replication takes under x seconds. Until recently, the partial block copying during pipeline recoveries made such an assumption. We saw the copies were often not complete in 20 seconds even with 384MB blocks on busy systems. This has been fixed, but I suspect there are other places with similar problems. For one, pending replication monitor has a fixed check interval, which may still work for 1GB blocks but start to break with bugger blocks. Simply increasing the check interval will have an adverse effect on smaller blocks.
As for the max block size, I vaguely remember Accumulo wanting to use a big block size. At least we can make sure downstream projects are fine with the limit.

Kihwal

      From: Steve Loughran <st...@hortonworks.com>
 To: Hadoop Common <co...@hadoop.apache.org> 
 Sent: Tuesday, February 16, 2016 5:01 AM
 Subject: Re: [Important] What is the practical maximum HDFS blocksize used in clusters?

> On 16 Feb 2016, at 08:04, Vinayakumar B <vi...@huawei.com> wrote:
> 
> Hi All,
> 
> Just wanted to know, what is the maximum and practical dfs.block.size used in production/test clusters.
> 
>  Current default value is 128MB and it can support upto 128TB ( Yup, right. It's just a configuration value though)
> 
>  I have seen clusters using upto 1G block size for big files.
> 
>  Is there anyone using >2GB for block size?
> 
>  This is just to check, whether any compatibility issue arises if we reduce the max supported blocksize to 32GB ( to be safer side ).
> 
> -vinay

Irrespective of whether the code handles blocks > 32 bits in size, as corruption and recovery is handled at the block level, if HDD/SDD bit corruption was uniform across the storage layer, the bigger the block, the higher the likelihood of corruption (that's assuming the cause is failures in the storage medium itself, not the wiriing, controller, ...). In theory, a 4GB block should be corrupted 8x faster than a 512MB block. And its time to replicate would be longer. Which means that there is an increased probability of multiple replica block corruptions occurring.

That's "in theory"; I've not seen any real data on that. I'd  like to. And in the meantime, if I were using very large blocks, make sure that the background block checksum thread is working away.

Re: [Important] What is the practical maximum HDFS blocksize used in clusters?

Posted by Steve Loughran <st...@hortonworks.com>.

> On 16 Feb 2016, at 08:04, Vinayakumar B <vi...@huawei.com> wrote:
> 
> Hi All,
> 
> Just wanted to know, what is the maximum and practical dfs.block.size used in production/test clusters.
> 
>  Current default value is 128MB and it can support upto 128TB ( Yup, right. It's just a configuration value though)
> 
>   I have seen clusters using upto 1G block size for big files.
> 
>   Is there anyone using >2GB for block size?
> 
>  This is just to check, whether any compatibility issue arises if we reduce the max supported blocksize to 32GB ( to be safer side ).
> 
> -vinay

Irrespective of whether the code handles blocks > 32 bits in size, as corruption and recovery is handled at the block level, if HDD/SDD bit corruption was uniform across the storage layer, the bigger the block, the higher the likelihood of corruption (that's assuming the cause is failures in the storage medium itself, not the wiriing, controller, ...). In theory, a 4GB block should be corrupted 8x faster than a 512MB block. And its time to replicate would be longer. Which means that there is an increased probability of multiple replica block corruptions occurring.

That's "in theory"; I've not seen any real data on that. I'd  like to. And in the meantime, if I were using very large blocks, make sure that the background block checksum thread is working away.