You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Jun Young Kim <ju...@gmail.com> on 2011/03/09 12:57:56 UTC

what's the differences between file.blocksize and dfs.blocksize in a job.xml?

hi,

I am wondering the concepts of file.blocksize and dfs.blocksize.

in hdfs-site.xml, I set
<property>
<name>dfs.block.size</name>
<value>536870912</value>
<final>true</final>
</property>

in job.xml, I found
*file.blocksize* 	67108864


*dfs.blocksize* 	536870912


dfs browser's page>

*Name*
	*Type*
	*Size*
	*Replication*
	*Block Size*
	*Modification Time*
	*Permission*
	*Owner*
	*Group*
*20110309160005 
<http://thadps06.scast.nhnsystem.com:50075/browseDirectory.jsp?dir=%2Fuser%2Firteam%2F20110309160005&namenodeInfoPort=50070&delegation=null>*
	*dir*
	
	
	
	*2011-03-09 16:51*
	*rwxr-xr-x*
	*test*
	*supergroup*
*all0307.ep 
<http://thadps06.scast.nhnsystem.com:50075/browseDirectory.jsp?dir=%2Fuser%2Firteam%2Fall0307.ep&namenodeInfoPort=50070&delegation=null>*
	*file*
	*21.53 GB*
	*2*
	*64 MB*
	*2011-03-09 15:58*
	*rw-r--r--*
	*test*
	*supergroup*
*all0307.svc 
<http://thadps06.scast.nhnsystem.com:50075/browseDirectory.jsp?dir=%2Fuser%2Firteam%2Fall0307.svc&namenodeInfoPort=50070&delegation=null>*
	*file*
	*21.53 GB*
	*2*
	*64 MB*
	*2011-03-09 15:13*
	*rw-r--r--*
	*test*
	*supergroup*



total size of inputs of a job is about 44GB(all0307.ep + all0307.svc).
in the step of maping, the split's numbers are 690. (that means a map 
task took a single block size as 64MB).

I thought the splits counts should be about 88 because a single block 
size is 512MB and input file's size are 44GB).

How could I get the result I want?

thanks.

-- 
Junyoung Kim (juneng603@gmail.com)


Re: what's the differences between file.blocksize and dfs.blocksize in a job.xml?

Posted by JunYoung Kim <ju...@gmail.com>.
hi, harsh.

is there a way to put my file on a hdfs with another block size?

usually, I did to copy my files to a hdfs. 

$> hadoop fs -copyFromLocal localFile hdfsFile

do I need to put some another field to re-create in my command?

thank

2011. 3. 13., 오후 5:42, Harsh J 작성:

> Hello,
> 
> On Wed, Mar 9, 2011 at 5:27 PM, Jun Young Kim <ju...@gmail.com> wrote:
>> hi,
>> I thought the splits counts should be about 88 because a single block size
>> is 512MB and input file's size are 44GB).
> 
> From your browser copy-paste (you could also use `fs -ls`, much more
> readable in mails :), it appears that your file has been created with
> a 64 MiB block size, not 512 MiB. Try re-creating the file with the
> new block size, and you should get what you want.
> 
> -- 
> Harsh J
> www.harshj.com


Re: what's the differences between file.blocksize and dfs.blocksize in a job.xml?

Posted by Harsh J <qw...@gmail.com>.
Hello,

On Wed, Mar 9, 2011 at 5:27 PM, Jun Young Kim <ju...@gmail.com> wrote:
> hi,
> I thought the splits counts should be about 88 because a single block size
> is 512MB and input file's size are 44GB).

>From your browser copy-paste (you could also use `fs -ls`, much more
readable in mails :), it appears that your file has been created with
a 64 MiB block size, not 512 MiB. Try re-creating the file with the
new block size, and you should get what you want.

-- 
Harsh J
www.harshj.com