You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Jianwu Wang <ji...@sdsc.edu> on 2011/08/23 02:36:01 UTC
how to get precious data size in hbase?
Hi there,
We have some data saved in hbase on HDFS. We know using the
following command can get the file size of each hbase table: hadoop fs
-dus /hbase/tableName.
For mysql, we can get exact data size for each table using sql
queries displayed on
http://www.mkyong.com/mysql/how-to-calculate-the-mysql-database-size/.
We can also get file disk size using command like: du -s
/path/to/datafile. Yet the data size gotten using sql query is quite
smaller than the file disk size gotten using du -s. We think the above
hadoop command also get file disk size, not the data size in database.
So we are wondering whether there is a way like msql query running on
hbase shell to get the data size in Hbase. Thanks a lot!
--
Best wishes
Sincerely yours
Jianwu Wang
jianwu@sdsc.edu
Re: how to get precious data size in hbase?
Posted by Jianwu Wang <ji...@sdsc.edu>.
Hi Lars,
Thanks for your info. Our data is dense and no compression is used.
We saw a blog on HBASE architecture at
http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html.
It looks |'hbase org.apache.hadoop.hbase.io.hfile.HFile|' can provide
more detailed info for each HFile and it has info like
'totalBytes=84055'. The totalBytes value is smaller than the value
gotten by "hadoop fs -dus" (84447 in the example). We are still trying
to understand what these values really mean.
On 8/22/11 10:46 PM, lars hofhansl wrote:
> Hi Jianwu,
>
>
> Are you using compression?
> Is your data sparse or dense? (I.e. for a typical row key, do all or most columns in your "schema" have values, or only a few)?
>
>
> With HBase you need to keep in mind that each value is tagged with (rowkey, column family name, column value, timestamp).
> That allows it to store data in a sparse way, but also means that each value comes with a lot of baggage.
>
>
> I've heard somewhere that a 3T Oracle database expanded to 28T in HBase without compression and to about 5T with GZ compression.
> That is just an anecdote, though, and probably stems from the fact that each column in Oracle was transferred to HBase, even empty (null) ones.
>
>
> -- Lars
>
>
>
> ________________________________
> From: Jianwu Wang<ji...@sdsc.edu>
> To: user@hbase.apache.org
> Sent: Monday, August 22, 2011 5:36 PM
> Subject: how to get precious data size in hbase?
>
> Hi there,
>
> We have some data saved in hbase on HDFS. We know using the following command can get the file size of each hbase table: hadoop fs -dus /hbase/tableName.
>
> For mysql, we can get exact data size for each table using sql queries displayed on http://www.mkyong.com/mysql/how-to-calculate-the-mysql-database-size/. We can also get file disk size using command like: du -s /path/to/datafile. Yet the data size gotten using sql query is quite smaller than the file disk size gotten using du -s. We think the above hadoop command also get file disk size, not the data size in database. So we are wondering whether there is a way like msql query running on hbase shell to get the data size in Hbase. Thanks a lot!
>
--
Best wishes
Sincerely yours
Jianwu Wang
jianwu@sdsc.edu
Re: how to get precious data size in hbase?
Posted by lars hofhansl <lh...@yahoo.com>.
Hi Jianwu,
Are you using compression?
Is your data sparse or dense? (I.e. for a typical row key, do all or most columns in your "schema" have values, or only a few)?
With HBase you need to keep in mind that each value is tagged with (rowkey, column family name, column value, timestamp).
That allows it to store data in a sparse way, but also means that each value comes with a lot of baggage.
I've heard somewhere that a 3T Oracle database expanded to 28T in HBase without compression and to about 5T with GZ compression.
That is just an anecdote, though, and probably stems from the fact that each column in Oracle was transferred to HBase, even empty (null) ones.
-- Lars
________________________________
From: Jianwu Wang <ji...@sdsc.edu>
To: user@hbase.apache.org
Sent: Monday, August 22, 2011 5:36 PM
Subject: how to get precious data size in hbase?
Hi there,
We have some data saved in hbase on HDFS. We know using the following command can get the file size of each hbase table: hadoop fs -dus /hbase/tableName.
For mysql, we can get exact data size for each table using sql queries displayed on http://www.mkyong.com/mysql/how-to-calculate-the-mysql-database-size/. We can also get file disk size using command like: du -s /path/to/datafile. Yet the data size gotten using sql query is quite smaller than the file disk size gotten using du -s. We think the above hadoop command also get file disk size, not the data size in database. So we are wondering whether there is a way like msql query running on hbase shell to get the data size in Hbase. Thanks a lot!
--
Best wishes
Sincerely yours
Jianwu Wang
jianwu@sdsc.edu