You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Panshul Whisper <ou...@gmail.com> on 2013/01/11 04:07:01 UTC

HDFS disk space requirements

Hello,

I have a 5 node hadoop cluster and a fully distributed Hbase setup on the
cluster with 130 GB of HDFS space avaialble. HDFS replication is set to 5.

I have a total of 115 GB of JSON files that need to be loaded into the
Hbase database and then they have to processed.

So is the available HDFS space sufficient for the operations?? considering
the replication and all factors?
or should I increase the space and by how much?

Thanking You,

-- 
Regards,
Ouch Whisper
010101010101

Re: HDFS disk space requirements

Posted by "Mesika, Asaf" <as...@gmail.com>.
130 GB raw data will take in HBase since it adds the family name, qualifier and timestamp to each value, so it can even be 150GB. You can check it exactly, by loading only one row with one column and see how much it takes on the HDFS file system (run compaction first).

Next, you 5 times that since you have 5 times replication, so 5x150=750GB

On Jan 11, 2013, at 5:07 AM, Panshul Whisper wrote:

> Hello,
> 
> I have a 5 node hadoop cluster and a fully distributed Hbase setup on the
> cluster with 130 GB of HDFS space avaialble. HDFS replication is set to 5.
> 
> I have a total of 115 GB of JSON files that need to be loaded into the
> Hbase database and then they have to processed.
> 
> So is the available HDFS space sufficient for the operations?? considering
> the replication and all factors?
> or should I increase the space and by how much?
> 
> Thanking You,
> 
> -- 
> Regards,
> Ouch Whisper
> 010101010101


Re: HDFS disk space requirements

Posted by Panshul Whisper <ou...@gmail.com>.
this is really helpful.. thanks so much for the ideas...




-- 
Regards,
Ouch Whisper
010101010101

Re: HDFS disk space requirements

Posted by Leonid Fedotov <lf...@hortonworks.com>.
What is the reason to have replication factor set to 5?
change it to 3 and you will save 30% of the space.
Also, you can load your JSON data to separate folder with replication set to 1, as it is only the source  and will be gone after processing.

Thank you!

Sincerely,
Leonid Fedotov


On Jan 10, 2013, at 7:07 PM, Panshul Whisper wrote:

> Hello,
> 
> I have a 5 node hadoop cluster and a fully distributed Hbase setup on the
> cluster with 130 GB of HDFS space avaialble. HDFS replication is set to 5.
> 
> I have a total of 115 GB of JSON files that need to be loaded into the
> Hbase database and then they have to processed.
> 
> So is the available HDFS space sufficient for the operations?? considering
> the replication and all factors?
> or should I increase the space and by how much?
> 
> Thanking You,
> 
> -- 
> Regards,
> Ouch Whisper
> 010101010101