You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Jianwu Wang <ji...@sdsc.edu> on 2011/08/23 02:36:01 UTC

how to get precious data size in hbase?

Hi there,

     We have some data saved in hbase on HDFS. We know using the 
following command can get the file size of each hbase table: hadoop fs 
-dus /hbase/tableName.

     For mysql, we can get exact data size for each table using sql 
queries displayed on 
http://www.mkyong.com/mysql/how-to-calculate-the-mysql-database-size/. 
We can also get file disk size using command like: du -s 
/path/to/datafile. Yet the data size gotten using sql query is quite 
smaller than the file disk size gotten using du -s. We think the above 
hadoop command also get file disk size, not the data size in database. 
So we are wondering whether there is a way like msql query running on 
hbase shell to get the data size in Hbase.  Thanks a lot!

-- 

Best wishes

Sincerely yours

Jianwu Wang
jianwu@sdsc.edu


Re: how to get precious data size in hbase?

Posted by Jianwu Wang <ji...@sdsc.edu>.
Hi Lars,

     Thanks for your info. Our data is dense and no compression is used.

     We saw a blog on HBASE architecture at 
http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html. 
It looks |'hbase org.apache.hadoop.hbase.io.hfile.HFile|' can provide 
more detailed info for each HFile and it has info like 
'totalBytes=84055'. The totalBytes value is smaller than the value 
gotten by "hadoop fs -dus" (84447 in the example). We are still trying 
to understand what these values really mean.


On 8/22/11 10:46 PM, lars hofhansl wrote:
> Hi Jianwu,
>
>
> Are you using compression?
> Is your data sparse or dense? (I.e. for a typical row key, do all or most columns in your "schema" have values, or only a few)?
>
>
> With HBase you need to keep in mind that each value is tagged with (rowkey, column family name, column value, timestamp).
> That allows it to store data in a sparse way, but also means that each value comes with a lot of baggage.
>
>
> I've heard somewhere that a 3T Oracle database expanded to 28T in HBase without compression and to about 5T with GZ compression.
> That is just an anecdote, though, and probably stems from the fact that each column in Oracle was transferred to HBase, even empty (null) ones.
>
>
> -- Lars
>
>
>
> ________________________________
> From: Jianwu Wang<ji...@sdsc.edu>
> To: user@hbase.apache.org
> Sent: Monday, August 22, 2011 5:36 PM
> Subject: how to get precious data size in hbase?
>
> Hi there,
>
>      We have some data saved in hbase on HDFS. We know using the following command can get the file size of each hbase table: hadoop fs -dus /hbase/tableName.
>
>      For mysql, we can get exact data size for each table using sql queries displayed on http://www.mkyong.com/mysql/how-to-calculate-the-mysql-database-size/. We can also get file disk size using command like: du -s /path/to/datafile. Yet the data size gotten using sql query is quite smaller than the file disk size gotten using du -s. We think the above hadoop command also get file disk size, not the data size in database. So we are wondering whether there is a way like msql query running on hbase shell to get the data size in Hbase.  Thanks a lot!
>


-- 

Best wishes

Sincerely yours

Jianwu Wang
jianwu@sdsc.edu


Re: how to get precious data size in hbase?

Posted by lars hofhansl <lh...@yahoo.com>.
Hi Jianwu,


Are you using compression?
Is your data sparse or dense? (I.e. for a typical row key, do all or most columns in your "schema" have values, or only a few)?


With HBase you need to keep in mind that each value is tagged with (rowkey, column family name, column value, timestamp).
That allows it to store data in a sparse way, but also means that each value comes with a lot of baggage.


I've heard somewhere that a 3T Oracle database expanded to 28T in HBase without compression and to about 5T with GZ compression.
That is just an anecdote, though, and probably stems from the fact that each column in Oracle was transferred to HBase, even empty (null) ones.


-- Lars



________________________________
From: Jianwu Wang <ji...@sdsc.edu>
To: user@hbase.apache.org
Sent: Monday, August 22, 2011 5:36 PM
Subject: how to get precious data size in hbase?

Hi there,

    We have some data saved in hbase on HDFS. We know using the following command can get the file size of each hbase table: hadoop fs -dus /hbase/tableName.

    For mysql, we can get exact data size for each table using sql queries displayed on http://www.mkyong.com/mysql/how-to-calculate-the-mysql-database-size/. We can also get file disk size using command like: du -s /path/to/datafile. Yet the data size gotten using sql query is quite smaller than the file disk size gotten using du -s. We think the above hadoop command also get file disk size, not the data size in database. So we are wondering whether there is a way like msql query running on hbase shell to get the data size in Hbase.  Thanks a lot!

-- 
Best wishes

Sincerely yours

Jianwu Wang
jianwu@sdsc.edu