You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by WangRamon <ra...@hotmail.com> on 2012/11/21 08:00:56 UTC

Is there an additional overhead when storing data in HDFS?

Hi All I'm wondering if there is an additional overhead when storing some data into HDFS? For example, I have a 2GB file, the replicate factor of HDSF is 2, when the file is uploaded to HDFS, should HDFS use 4GB to store it or more then 4GB to store it? If it takes more than 4GB space, why? ThanksRamon  		 	   		  

RE: Is there an additional overhead when storing data in HDFS?

Posted by WangRamon <ra...@hotmail.com>.
Thank guys, great job.
 From: dontariq@gmail.com
Date: Wed, 21 Nov 2012 13:23:08 +0530
Subject: Re: Is there an additional overhead when storing data in HDFS?
To: user@hadoop.apache.org

Hello Ramon,
 Why don't you go through this link once : http://www.aosabook.org/en/hdfs.htmlSuresh and guys have explained everything beautifully.


HTHRegards,    Mohammad Tariq



On Wed, Nov 21, 2012 at 12:58 PM, Suresh Srinivas <su...@hortonworks.com> wrote:


Namenode will have trivial amount of data stored in journal/fsimage. 

On Tue, Nov 20, 2012 at 11:21 PM, WangRamon <ra...@hotmail.com> wrote:






Thanks, besides the checksum data is there anything else? Data in name node?
 
Date: Tue, 20 Nov 2012 23:14:06 -0800
Subject: Re: Is there an additional overhead when storing data in HDFS?



From: suresh@hortonworks.com
To: user@hadoop.apache.org

HDFS uses 4GB for the file + checksum data.



Default is for every 512 bytes of data, 4 bytes of checksum are stored. In this case additional 32MB data.

On Tue, Nov 20, 2012 at 11:00 PM, WangRamon <ra...@hotmail.com> wrote:







Hi All
 
I'm wondering if there is an additional overhead when storing some data into HDFS? For example, I have a 2GB file, the replicate factor of HDSF is 2, when the file is uploaded to HDFS, should HDFS use 4GB to store it or more then 4GB to store it? If it takes more than 4GB space, why?




 
Thanks
Ramon 
 		 	   		  


-- 
 http://hortonworks.com/download/


 		 	   		  


-- 
 http://hortonworks.com/download/



 		 	   		  

RE: Is there an additional overhead when storing data in HDFS?

Posted by WangRamon <ra...@hotmail.com>.
Thank guys, great job.
 From: dontariq@gmail.com
Date: Wed, 21 Nov 2012 13:23:08 +0530
Subject: Re: Is there an additional overhead when storing data in HDFS?
To: user@hadoop.apache.org

Hello Ramon,
 Why don't you go through this link once : http://www.aosabook.org/en/hdfs.htmlSuresh and guys have explained everything beautifully.


HTHRegards,    Mohammad Tariq



On Wed, Nov 21, 2012 at 12:58 PM, Suresh Srinivas <su...@hortonworks.com> wrote:


Namenode will have trivial amount of data stored in journal/fsimage. 

On Tue, Nov 20, 2012 at 11:21 PM, WangRamon <ra...@hotmail.com> wrote:






Thanks, besides the checksum data is there anything else? Data in name node?
 
Date: Tue, 20 Nov 2012 23:14:06 -0800
Subject: Re: Is there an additional overhead when storing data in HDFS?



From: suresh@hortonworks.com
To: user@hadoop.apache.org

HDFS uses 4GB for the file + checksum data.



Default is for every 512 bytes of data, 4 bytes of checksum are stored. In this case additional 32MB data.

On Tue, Nov 20, 2012 at 11:00 PM, WangRamon <ra...@hotmail.com> wrote:







Hi All
 
I'm wondering if there is an additional overhead when storing some data into HDFS? For example, I have a 2GB file, the replicate factor of HDSF is 2, when the file is uploaded to HDFS, should HDFS use 4GB to store it or more then 4GB to store it? If it takes more than 4GB space, why?




 
Thanks
Ramon 
 		 	   		  


-- 
 http://hortonworks.com/download/


 		 	   		  


-- 
 http://hortonworks.com/download/



 		 	   		  

RE: Is there an additional overhead when storing data in HDFS?

Posted by WangRamon <ra...@hotmail.com>.
Thank guys, great job.
 From: dontariq@gmail.com
Date: Wed, 21 Nov 2012 13:23:08 +0530
Subject: Re: Is there an additional overhead when storing data in HDFS?
To: user@hadoop.apache.org

Hello Ramon,
 Why don't you go through this link once : http://www.aosabook.org/en/hdfs.htmlSuresh and guys have explained everything beautifully.


HTHRegards,    Mohammad Tariq



On Wed, Nov 21, 2012 at 12:58 PM, Suresh Srinivas <su...@hortonworks.com> wrote:


Namenode will have trivial amount of data stored in journal/fsimage. 

On Tue, Nov 20, 2012 at 11:21 PM, WangRamon <ra...@hotmail.com> wrote:






Thanks, besides the checksum data is there anything else? Data in name node?
 
Date: Tue, 20 Nov 2012 23:14:06 -0800
Subject: Re: Is there an additional overhead when storing data in HDFS?



From: suresh@hortonworks.com
To: user@hadoop.apache.org

HDFS uses 4GB for the file + checksum data.



Default is for every 512 bytes of data, 4 bytes of checksum are stored. In this case additional 32MB data.

On Tue, Nov 20, 2012 at 11:00 PM, WangRamon <ra...@hotmail.com> wrote:







Hi All
 
I'm wondering if there is an additional overhead when storing some data into HDFS? For example, I have a 2GB file, the replicate factor of HDSF is 2, when the file is uploaded to HDFS, should HDFS use 4GB to store it or more then 4GB to store it? If it takes more than 4GB space, why?




 
Thanks
Ramon 
 		 	   		  


-- 
 http://hortonworks.com/download/


 		 	   		  


-- 
 http://hortonworks.com/download/



 		 	   		  

RE: Is there an additional overhead when storing data in HDFS?

Posted by WangRamon <ra...@hotmail.com>.
Thank guys, great job.
 From: dontariq@gmail.com
Date: Wed, 21 Nov 2012 13:23:08 +0530
Subject: Re: Is there an additional overhead when storing data in HDFS?
To: user@hadoop.apache.org

Hello Ramon,
 Why don't you go through this link once : http://www.aosabook.org/en/hdfs.htmlSuresh and guys have explained everything beautifully.


HTHRegards,    Mohammad Tariq



On Wed, Nov 21, 2012 at 12:58 PM, Suresh Srinivas <su...@hortonworks.com> wrote:


Namenode will have trivial amount of data stored in journal/fsimage. 

On Tue, Nov 20, 2012 at 11:21 PM, WangRamon <ra...@hotmail.com> wrote:






Thanks, besides the checksum data is there anything else? Data in name node?
 
Date: Tue, 20 Nov 2012 23:14:06 -0800
Subject: Re: Is there an additional overhead when storing data in HDFS?



From: suresh@hortonworks.com
To: user@hadoop.apache.org

HDFS uses 4GB for the file + checksum data.



Default is for every 512 bytes of data, 4 bytes of checksum are stored. In this case additional 32MB data.

On Tue, Nov 20, 2012 at 11:00 PM, WangRamon <ra...@hotmail.com> wrote:







Hi All
 
I'm wondering if there is an additional overhead when storing some data into HDFS? For example, I have a 2GB file, the replicate factor of HDSF is 2, when the file is uploaded to HDFS, should HDFS use 4GB to store it or more then 4GB to store it? If it takes more than 4GB space, why?




 
Thanks
Ramon 
 		 	   		  


-- 
 http://hortonworks.com/download/


 		 	   		  


-- 
 http://hortonworks.com/download/



 		 	   		  

Re: Is there an additional overhead when storing data in HDFS?

Posted by Mohammad Tariq <do...@gmail.com>.
Hello Ramon,

 Why don't you go through this link once :
http://www.aosabook.org/en/hdfs.html
Suresh and guys have explained everything beautifully.

HTH

Regards,
    Mohammad Tariq



On Wed, Nov 21, 2012 at 12:58 PM, Suresh Srinivas <su...@hortonworks.com>wrote:

> Namenode will have trivial amount of data stored in journal/fsimage.
>
>
> On Tue, Nov 20, 2012 at 11:21 PM, WangRamon <ra...@hotmail.com>wrote:
>
>> Thanks, besides the checksum data is there anything else? Data in name
>> node?
>>
>> ------------------------------
>> Date: Tue, 20 Nov 2012 23:14:06 -0800
>> Subject: Re: Is there an additional overhead when storing data in HDFS?
>> From: suresh@hortonworks.com
>> To: user@hadoop.apache.org
>>
>>
>> HDFS uses 4GB for the file + checksum data.
>>
>> Default is for every 512 bytes of data, 4 bytes of checksum are stored.
>> In this case additional 32MB data.
>>
>> On Tue, Nov 20, 2012 at 11:00 PM, WangRamon <ra...@hotmail.com>wrote:
>>
>> Hi All
>>
>> I'm wondering if there is an additional overhead when storing some data
>> into HDFS? For example, I have a 2GB file, the replicate factor of HDSF is
>> 2, when the file is uploaded to HDFS, should HDFS use 4GB to store it or
>> more then 4GB to store it? If it takes more than 4GB space, why?
>>
>> Thanks
>> Ramon
>>
>>
>>
>>
>> --
>> http://hortonworks.com/download/
>>
>>
>
>
> --
> http://hortonworks.com/download/
>
>

Re: Is there an additional overhead when storing data in HDFS?

Posted by Mohammad Tariq <do...@gmail.com>.
Hello Ramon,

 Why don't you go through this link once :
http://www.aosabook.org/en/hdfs.html
Suresh and guys have explained everything beautifully.

HTH

Regards,
    Mohammad Tariq



On Wed, Nov 21, 2012 at 12:58 PM, Suresh Srinivas <su...@hortonworks.com>wrote:

> Namenode will have trivial amount of data stored in journal/fsimage.
>
>
> On Tue, Nov 20, 2012 at 11:21 PM, WangRamon <ra...@hotmail.com>wrote:
>
>> Thanks, besides the checksum data is there anything else? Data in name
>> node?
>>
>> ------------------------------
>> Date: Tue, 20 Nov 2012 23:14:06 -0800
>> Subject: Re: Is there an additional overhead when storing data in HDFS?
>> From: suresh@hortonworks.com
>> To: user@hadoop.apache.org
>>
>>
>> HDFS uses 4GB for the file + checksum data.
>>
>> Default is for every 512 bytes of data, 4 bytes of checksum are stored.
>> In this case additional 32MB data.
>>
>> On Tue, Nov 20, 2012 at 11:00 PM, WangRamon <ra...@hotmail.com>wrote:
>>
>> Hi All
>>
>> I'm wondering if there is an additional overhead when storing some data
>> into HDFS? For example, I have a 2GB file, the replicate factor of HDSF is
>> 2, when the file is uploaded to HDFS, should HDFS use 4GB to store it or
>> more then 4GB to store it? If it takes more than 4GB space, why?
>>
>> Thanks
>> Ramon
>>
>>
>>
>>
>> --
>> http://hortonworks.com/download/
>>
>>
>
>
> --
> http://hortonworks.com/download/
>
>

Re: Is there an additional overhead when storing data in HDFS?

Posted by Mohammad Tariq <do...@gmail.com>.
Hello Ramon,

 Why don't you go through this link once :
http://www.aosabook.org/en/hdfs.html
Suresh and guys have explained everything beautifully.

HTH

Regards,
    Mohammad Tariq



On Wed, Nov 21, 2012 at 12:58 PM, Suresh Srinivas <su...@hortonworks.com>wrote:

> Namenode will have trivial amount of data stored in journal/fsimage.
>
>
> On Tue, Nov 20, 2012 at 11:21 PM, WangRamon <ra...@hotmail.com>wrote:
>
>> Thanks, besides the checksum data is there anything else? Data in name
>> node?
>>
>> ------------------------------
>> Date: Tue, 20 Nov 2012 23:14:06 -0800
>> Subject: Re: Is there an additional overhead when storing data in HDFS?
>> From: suresh@hortonworks.com
>> To: user@hadoop.apache.org
>>
>>
>> HDFS uses 4GB for the file + checksum data.
>>
>> Default is for every 512 bytes of data, 4 bytes of checksum are stored.
>> In this case additional 32MB data.
>>
>> On Tue, Nov 20, 2012 at 11:00 PM, WangRamon <ra...@hotmail.com>wrote:
>>
>> Hi All
>>
>> I'm wondering if there is an additional overhead when storing some data
>> into HDFS? For example, I have a 2GB file, the replicate factor of HDSF is
>> 2, when the file is uploaded to HDFS, should HDFS use 4GB to store it or
>> more then 4GB to store it? If it takes more than 4GB space, why?
>>
>> Thanks
>> Ramon
>>
>>
>>
>>
>> --
>> http://hortonworks.com/download/
>>
>>
>
>
> --
> http://hortonworks.com/download/
>
>

Re: Is there an additional overhead when storing data in HDFS?

Posted by Mohammad Tariq <do...@gmail.com>.
Hello Ramon,

 Why don't you go through this link once :
http://www.aosabook.org/en/hdfs.html
Suresh and guys have explained everything beautifully.

HTH

Regards,
    Mohammad Tariq



On Wed, Nov 21, 2012 at 12:58 PM, Suresh Srinivas <su...@hortonworks.com>wrote:

> Namenode will have trivial amount of data stored in journal/fsimage.
>
>
> On Tue, Nov 20, 2012 at 11:21 PM, WangRamon <ra...@hotmail.com>wrote:
>
>> Thanks, besides the checksum data is there anything else? Data in name
>> node?
>>
>> ------------------------------
>> Date: Tue, 20 Nov 2012 23:14:06 -0800
>> Subject: Re: Is there an additional overhead when storing data in HDFS?
>> From: suresh@hortonworks.com
>> To: user@hadoop.apache.org
>>
>>
>> HDFS uses 4GB for the file + checksum data.
>>
>> Default is for every 512 bytes of data, 4 bytes of checksum are stored.
>> In this case additional 32MB data.
>>
>> On Tue, Nov 20, 2012 at 11:00 PM, WangRamon <ra...@hotmail.com>wrote:
>>
>> Hi All
>>
>> I'm wondering if there is an additional overhead when storing some data
>> into HDFS? For example, I have a 2GB file, the replicate factor of HDSF is
>> 2, when the file is uploaded to HDFS, should HDFS use 4GB to store it or
>> more then 4GB to store it? If it takes more than 4GB space, why?
>>
>> Thanks
>> Ramon
>>
>>
>>
>>
>> --
>> http://hortonworks.com/download/
>>
>>
>
>
> --
> http://hortonworks.com/download/
>
>

Re: Is there an additional overhead when storing data in HDFS?

Posted by Suresh Srinivas <su...@hortonworks.com>.
Namenode will have trivial amount of data stored in journal/fsimage.

On Tue, Nov 20, 2012 at 11:21 PM, WangRamon <ra...@hotmail.com> wrote:

> Thanks, besides the checksum data is there anything else? Data in name
> node?
>
> ------------------------------
> Date: Tue, 20 Nov 2012 23:14:06 -0800
> Subject: Re: Is there an additional overhead when storing data in HDFS?
> From: suresh@hortonworks.com
> To: user@hadoop.apache.org
>
>
> HDFS uses 4GB for the file + checksum data.
>
> Default is for every 512 bytes of data, 4 bytes of checksum are stored. In
> this case additional 32MB data.
>
> On Tue, Nov 20, 2012 at 11:00 PM, WangRamon <ra...@hotmail.com>wrote:
>
> Hi All
>
> I'm wondering if there is an additional overhead when storing some data
> into HDFS? For example, I have a 2GB file, the replicate factor of HDSF is
> 2, when the file is uploaded to HDFS, should HDFS use 4GB to store it or
> more then 4GB to store it? If it takes more than 4GB space, why?
>
> Thanks
> Ramon
>
>
>
>
> --
> http://hortonworks.com/download/
>
>


-- 
http://hortonworks.com/download/

Re: Is there an additional overhead when storing data in HDFS?

Posted by Suresh Srinivas <su...@hortonworks.com>.
Namenode will have trivial amount of data stored in journal/fsimage.

On Tue, Nov 20, 2012 at 11:21 PM, WangRamon <ra...@hotmail.com> wrote:

> Thanks, besides the checksum data is there anything else? Data in name
> node?
>
> ------------------------------
> Date: Tue, 20 Nov 2012 23:14:06 -0800
> Subject: Re: Is there an additional overhead when storing data in HDFS?
> From: suresh@hortonworks.com
> To: user@hadoop.apache.org
>
>
> HDFS uses 4GB for the file + checksum data.
>
> Default is for every 512 bytes of data, 4 bytes of checksum are stored. In
> this case additional 32MB data.
>
> On Tue, Nov 20, 2012 at 11:00 PM, WangRamon <ra...@hotmail.com>wrote:
>
> Hi All
>
> I'm wondering if there is an additional overhead when storing some data
> into HDFS? For example, I have a 2GB file, the replicate factor of HDSF is
> 2, when the file is uploaded to HDFS, should HDFS use 4GB to store it or
> more then 4GB to store it? If it takes more than 4GB space, why?
>
> Thanks
> Ramon
>
>
>
>
> --
> http://hortonworks.com/download/
>
>


-- 
http://hortonworks.com/download/

Re: Is there an additional overhead when storing data in HDFS?

Posted by Suresh Srinivas <su...@hortonworks.com>.
Namenode will have trivial amount of data stored in journal/fsimage.

On Tue, Nov 20, 2012 at 11:21 PM, WangRamon <ra...@hotmail.com> wrote:

> Thanks, besides the checksum data is there anything else? Data in name
> node?
>
> ------------------------------
> Date: Tue, 20 Nov 2012 23:14:06 -0800
> Subject: Re: Is there an additional overhead when storing data in HDFS?
> From: suresh@hortonworks.com
> To: user@hadoop.apache.org
>
>
> HDFS uses 4GB for the file + checksum data.
>
> Default is for every 512 bytes of data, 4 bytes of checksum are stored. In
> this case additional 32MB data.
>
> On Tue, Nov 20, 2012 at 11:00 PM, WangRamon <ra...@hotmail.com>wrote:
>
> Hi All
>
> I'm wondering if there is an additional overhead when storing some data
> into HDFS? For example, I have a 2GB file, the replicate factor of HDSF is
> 2, when the file is uploaded to HDFS, should HDFS use 4GB to store it or
> more then 4GB to store it? If it takes more than 4GB space, why?
>
> Thanks
> Ramon
>
>
>
>
> --
> http://hortonworks.com/download/
>
>


-- 
http://hortonworks.com/download/

Re: Is there an additional overhead when storing data in HDFS?

Posted by Suresh Srinivas <su...@hortonworks.com>.
Namenode will have trivial amount of data stored in journal/fsimage.

On Tue, Nov 20, 2012 at 11:21 PM, WangRamon <ra...@hotmail.com> wrote:

> Thanks, besides the checksum data is there anything else? Data in name
> node?
>
> ------------------------------
> Date: Tue, 20 Nov 2012 23:14:06 -0800
> Subject: Re: Is there an additional overhead when storing data in HDFS?
> From: suresh@hortonworks.com
> To: user@hadoop.apache.org
>
>
> HDFS uses 4GB for the file + checksum data.
>
> Default is for every 512 bytes of data, 4 bytes of checksum are stored. In
> this case additional 32MB data.
>
> On Tue, Nov 20, 2012 at 11:00 PM, WangRamon <ra...@hotmail.com>wrote:
>
> Hi All
>
> I'm wondering if there is an additional overhead when storing some data
> into HDFS? For example, I have a 2GB file, the replicate factor of HDSF is
> 2, when the file is uploaded to HDFS, should HDFS use 4GB to store it or
> more then 4GB to store it? If it takes more than 4GB space, why?
>
> Thanks
> Ramon
>
>
>
>
> --
> http://hortonworks.com/download/
>
>


-- 
http://hortonworks.com/download/

RE: Is there an additional overhead when storing data in HDFS?

Posted by WangRamon <ra...@hotmail.com>.
Thanks, besides the checksum data is there anything else? Data in name node?
 Date: Tue, 20 Nov 2012 23:14:06 -0800
Subject: Re: Is there an additional overhead when storing data in HDFS?
From: suresh@hortonworks.com
To: user@hadoop.apache.org

HDFS uses 4GB for the file + checksum data.
Default is for every 512 bytes of data, 4 bytes of checksum are stored. In this case additional 32MB data.

On Tue, Nov 20, 2012 at 11:00 PM, WangRamon <ra...@hotmail.com> wrote:




Hi All
 
I'm wondering if there is an additional overhead when storing some data into HDFS? For example, I have a 2GB file, the replicate factor of HDSF is 2, when the file is uploaded to HDFS, should HDFS use 4GB to store it or more then 4GB to store it? If it takes more than 4GB space, why?

 
Thanks
Ramon 
 		 	   		  


-- 
 http://hortonworks.com/download/


 		 	   		  

RE: Is there an additional overhead when storing data in HDFS?

Posted by WangRamon <ra...@hotmail.com>.
Thanks, besides the checksum data is there anything else? Data in name node?
 Date: Tue, 20 Nov 2012 23:14:06 -0800
Subject: Re: Is there an additional overhead when storing data in HDFS?
From: suresh@hortonworks.com
To: user@hadoop.apache.org

HDFS uses 4GB for the file + checksum data.
Default is for every 512 bytes of data, 4 bytes of checksum are stored. In this case additional 32MB data.

On Tue, Nov 20, 2012 at 11:00 PM, WangRamon <ra...@hotmail.com> wrote:




Hi All
 
I'm wondering if there is an additional overhead when storing some data into HDFS? For example, I have a 2GB file, the replicate factor of HDSF is 2, when the file is uploaded to HDFS, should HDFS use 4GB to store it or more then 4GB to store it? If it takes more than 4GB space, why?

 
Thanks
Ramon 
 		 	   		  


-- 
 http://hortonworks.com/download/


 		 	   		  

RE: Is there an additional overhead when storing data in HDFS?

Posted by WangRamon <ra...@hotmail.com>.
Thanks, besides the checksum data is there anything else? Data in name node?
 Date: Tue, 20 Nov 2012 23:14:06 -0800
Subject: Re: Is there an additional overhead when storing data in HDFS?
From: suresh@hortonworks.com
To: user@hadoop.apache.org

HDFS uses 4GB for the file + checksum data.
Default is for every 512 bytes of data, 4 bytes of checksum are stored. In this case additional 32MB data.

On Tue, Nov 20, 2012 at 11:00 PM, WangRamon <ra...@hotmail.com> wrote:




Hi All
 
I'm wondering if there is an additional overhead when storing some data into HDFS? For example, I have a 2GB file, the replicate factor of HDSF is 2, when the file is uploaded to HDFS, should HDFS use 4GB to store it or more then 4GB to store it? If it takes more than 4GB space, why?

 
Thanks
Ramon 
 		 	   		  


-- 
 http://hortonworks.com/download/


 		 	   		  

RE: Is there an additional overhead when storing data in HDFS?

Posted by WangRamon <ra...@hotmail.com>.
Thanks, besides the checksum data is there anything else? Data in name node?
 Date: Tue, 20 Nov 2012 23:14:06 -0800
Subject: Re: Is there an additional overhead when storing data in HDFS?
From: suresh@hortonworks.com
To: user@hadoop.apache.org

HDFS uses 4GB for the file + checksum data.
Default is for every 512 bytes of data, 4 bytes of checksum are stored. In this case additional 32MB data.

On Tue, Nov 20, 2012 at 11:00 PM, WangRamon <ra...@hotmail.com> wrote:




Hi All
 
I'm wondering if there is an additional overhead when storing some data into HDFS? For example, I have a 2GB file, the replicate factor of HDSF is 2, when the file is uploaded to HDFS, should HDFS use 4GB to store it or more then 4GB to store it? If it takes more than 4GB space, why?

 
Thanks
Ramon 
 		 	   		  


-- 
 http://hortonworks.com/download/


 		 	   		  

Re: Is there an additional overhead when storing data in HDFS?

Posted by Suresh Srinivas <su...@hortonworks.com>.
HDFS uses 4GB for the file + checksum data.

Default is for every 512 bytes of data, 4 bytes of checksum are stored. In
this case additional 32MB data.

On Tue, Nov 20, 2012 at 11:00 PM, WangRamon <ra...@hotmail.com> wrote:

> Hi All
>
> I'm wondering if there is an additional overhead when storing some data
> into HDFS? For example, I have a 2GB file, the replicate factor of HDSF is
> 2, when the file is uploaded to HDFS, should HDFS use 4GB to store it or
> more then 4GB to store it? If it takes more than 4GB space, why?
>
> Thanks
> Ramon
>



-- 
http://hortonworks.com/download/

Re: Is there an additional overhead when storing data in HDFS?

Posted by Suresh Srinivas <su...@hortonworks.com>.
HDFS uses 4GB for the file + checksum data.

Default is for every 512 bytes of data, 4 bytes of checksum are stored. In
this case additional 32MB data.

On Tue, Nov 20, 2012 at 11:00 PM, WangRamon <ra...@hotmail.com> wrote:

> Hi All
>
> I'm wondering if there is an additional overhead when storing some data
> into HDFS? For example, I have a 2GB file, the replicate factor of HDSF is
> 2, when the file is uploaded to HDFS, should HDFS use 4GB to store it or
> more then 4GB to store it? If it takes more than 4GB space, why?
>
> Thanks
> Ramon
>



-- 
http://hortonworks.com/download/

Re: Is there an additional overhead when storing data in HDFS?

Posted by Suresh Srinivas <su...@hortonworks.com>.
HDFS uses 4GB for the file + checksum data.

Default is for every 512 bytes of data, 4 bytes of checksum are stored. In
this case additional 32MB data.

On Tue, Nov 20, 2012 at 11:00 PM, WangRamon <ra...@hotmail.com> wrote:

> Hi All
>
> I'm wondering if there is an additional overhead when storing some data
> into HDFS? For example, I have a 2GB file, the replicate factor of HDSF is
> 2, when the file is uploaded to HDFS, should HDFS use 4GB to store it or
> more then 4GB to store it? If it takes more than 4GB space, why?
>
> Thanks
> Ramon
>



-- 
http://hortonworks.com/download/

Re: Is there an additional overhead when storing data in HDFS?

Posted by Suresh Srinivas <su...@hortonworks.com>.
HDFS uses 4GB for the file + checksum data.

Default is for every 512 bytes of data, 4 bytes of checksum are stored. In
this case additional 32MB data.

On Tue, Nov 20, 2012 at 11:00 PM, WangRamon <ra...@hotmail.com> wrote:

> Hi All
>
> I'm wondering if there is an additional overhead when storing some data
> into HDFS? For example, I have a 2GB file, the replicate factor of HDSF is
> 2, when the file is uploaded to HDFS, should HDFS use 4GB to store it or
> more then 4GB to store it? If it takes more than 4GB space, why?
>
> Thanks
> Ramon
>



-- 
http://hortonworks.com/download/