You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Otis Gospodnetic <ot...@yahoo.com> on 2008/05/17 02:42:27 UTC

dfs.block.size vs avg block size

Hello,

I checked the ML archives and the Wiki, as well as the HDFS user guide, but could not find information about how to change block size of an existing HDFS.

After running fsck I can see that my avg. block size is 12706144 B (cca 12MB), and that's a lot smaller than what I have configured: dfs.block.size=67108864 B

Is the difference between the configured block size and actual (avg) block size results effectively wasted space?
If so, is there a way to change the DFS block size and have Hadoop shrink all the existing blocks?
I am OK with not running any jobs on the cluster for a day or two if I can do something to free up the wasted disk space.


Thanks,
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


Re: dfs.block.size vs avg block size

Posted by Dhruba Borthakur <dh...@gmail.com>.
There isn's a way to change the block size of an existing file. The
block size of a file can be specified only at the time of file
creation and cannot be changed later.

There isn't any wasted space in your system. If the block size is
128MB but you create a HDFS file of say size 10MB, then that file will
contain one block and that block will occupy only 10MB on HDFS
storage. No space gets wasted.

hope this helps,
dhruba

On Fri, May 16, 2008 at 4:42 PM, Otis Gospodnetic
<ot...@yahoo.com> wrote:
> Hello,
>
> I checked the ML archives and the Wiki, as well as the HDFS user guide, but could not find information about how to change block size of an existing HDFS.
>
> After running fsck I can see that my avg. block size is 12706144 B (cca 12MB), and that's a lot smaller than what I have configured: dfs.block.size=67108864 B
>
> Is the difference between the configured block size and actual (avg) block size results effectively wasted space?
> If so, is there a way to change the DFS block size and have Hadoop shrink all the existing blocks?
> I am OK with not running any jobs on the cluster for a day or two if I can do something to free up the wasted disk space.
>
>
> Thanks,
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>