You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by marjana <mi...@us.ibm.com> on 2016/03/31 03:31:56 UTC

find size of each table in the cluster

Hello,
I am new to hBase, so sorry if I am talking nonsense.

I am trying to figure out a way how to find the total size of each table in
my hBase.
I have looked into hbase shell commands. There's "status 'detailed'", that
shows storefileSizeMB. If I were to add all of these grouped by tablename,
would that be the correct way to show MB used per table?
Is there any other (easier/cleaner) way?
hbase version is 1.1.1.2.3, HDFS: 2.7.1
Thanks
Marjana



--
View this message in context: http://apache-hbase.679495.n3.nabble.com/find-size-of-each-table-in-the-cluster-tp4078899.html
Sent from the HBase User mailing list archive at Nabble.com.

RE: find size of each table in the cluster

Posted by Ted Tuttle <te...@mentacapital.com>.
Hello-

We are running v0.94.9 cluster.

I am seeing that 'fs -dus' reports 24TB used and 'fs -df' reports 74.TB used.  

Does anyone know why these do not reconcile? Our replication factor is 2 so that is not a likely explanation.

Shown below are results from my cluster (doctored to TB for ease of reading):

bash-4.1$ hadoop fs -dus /hbase
hdfs://host/hbase      24.5TB

bash-4.1$ hadoop fs -df /hbase
Filesystem              Size    Used    Avail   Use%
/hbase          103.8TB 74.2TB 24.3TB  71%

Re: find size of each table in the cluster

Posted by marjana <mi...@us.ibm.com>.
:)

Output of hadoop fs -du -s -h 'tablename' matches output of "status
'detailed'" when I sum up all storefileSizeMB values.

Thanks!



--
View this message in context: http://apache-hbase.679495.n3.nabble.com/find-size-of-each-table-in-the-cluster-tp4078899p4078931.html
Sent from the HBase User mailing list archive at Nabble.com.

Re: find size of each table in the cluster

Posted by Ted Yu <yu...@gmail.com>.
bq. COMPRESSION => 'LZ4',

The answer is given by above attribute :-)

On Thu, Mar 31, 2016 at 10:41 AM, marjana <mi...@us.ibm.com> wrote:

> Sure, here's describe of one table:
>
> Table RAWHITS_AURORA-COM is ENABLED
> RAWHITS_AURORA-COM
> COLUMN FAMILIES DESCRIPTION
> {NAME => 'f1', DATA_BLOCK_ENCODING => 'FAST_DIFF', BLOOMFILTER => 'ROW',
> REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'LZ4',
> MIN_VERSIONS => '0', TTL => '2160000 SECONDS (2
> 5 DAYS)', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '262144', IN_MEMORY
> =>
> 'false', BLOCKCACHE => 'true'}
>
>
>
> --
> View this message in context:
> http://apache-hbase.679495.n3.nabble.com/find-size-of-each-table-in-the-cluster-tp4078899p4078927.html
> Sent from the HBase User mailing list archive at Nabble.com.
>

Re: find size of each table in the cluster

Posted by marjana <mi...@us.ibm.com>.
Sure, here's describe of one table:

Table RAWHITS_AURORA-COM is ENABLED                                                                                                                                                      
RAWHITS_AURORA-COM                                                                                                                                                                       
COLUMN FAMILIES DESCRIPTION                                                                                                                                                              
{NAME => 'f1', DATA_BLOCK_ENCODING => 'FAST_DIFF', BLOOMFILTER => 'ROW',
REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'LZ4',
MIN_VERSIONS => '0', TTL => '2160000 SECONDS (2
5 DAYS)', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '262144', IN_MEMORY =>
'false', BLOCKCACHE => 'true'}                                                 



--
View this message in context: http://apache-hbase.679495.n3.nabble.com/find-size-of-each-table-in-the-cluster-tp4078899p4078927.html
Sent from the HBase User mailing list archive at Nabble.com.

Re: find size of each table in the cluster

Posted by Ted Yu <yu...@gmail.com>.
bq. data is distributed on node servers,

Data is on hdfs, i.e. the Data Nodes.

bq. it gets propagated to all data nodes,

If I understand correctly, the -du command queries namenode.

bq. Is this size compressed or uncompressed?

Can you show us the table description (output of describe command in hbase
shell) ?

On Thu, Mar 31, 2016 at 8:38 AM, marjana <mi...@us.ibm.com> wrote:

> Thanks all on your replies.
> This is clustered env, with 2 master nodes and 4 data nodes. Master nodes
> have these components installed (as shown in Ambari UI):
> active hbase master
> history server
> name node
> resource manager
> zookeeper server
> metrics monitor
>
> Node server has these components:
> Data Node
> region server
> metrics monitor
> node manager
>
> So I looked on my node server for the hbase.rootdir, and it points to my
> hdfs://hbasmaserserver:8020//apps/hbase/data.
> Now this is confusing to me as I thought data is distributed on node
> servers, where region servers are.
> I sshed to my masterserver and looked under this dir and did see all my
> tables in my default namespace. Example:
> $ hadoop fs -du -s -h /apps/hbase/data/data/default/RAWHITS_AURORA-COM
> 2.0 G  /apps/hbase/data/data/default/RAWHITS_AURORA-COM
>
> So when I run this command on hbmaster, it gets propagated to all data
> nodes, correct? Is this size compressed or uncompressed?
>
> Many thanks!
> Marjana
>
>
>
>
> --
> View this message in context:
> http://apache-hbase.679495.n3.nabble.com/find-size-of-each-table-in-the-cluster-tp4078899p4078919.html
> Sent from the HBase User mailing list archive at Nabble.com.
>

Re: find size of each table in the cluster

Posted by marjana <mi...@us.ibm.com>.
Thanks all on your replies.
This is clustered env, with 2 master nodes and 4 data nodes. Master nodes
have these components installed (as shown in Ambari UI):
active hbase master
history server
name node
resource manager
zookeeper server
metrics monitor

Node server has these components:
Data Node
region server
metrics monitor
node manager

So I looked on my node server for the hbase.rootdir, and it points to my
hdfs://hbasmaserserver:8020//apps/hbase/data.
Now this is confusing to me as I thought data is distributed on node
servers, where region servers are.
I sshed to my masterserver and looked under this dir and did see all my
tables in my default namespace. Example:
$ hadoop fs -du -s -h /apps/hbase/data/data/default/RAWHITS_AURORA-COM
2.0 G  /apps/hbase/data/data/default/RAWHITS_AURORA-COM

So when I run this command on hbmaster, it gets propagated to all data
nodes, correct? Is this size compressed or uncompressed?

Many thanks!
Marjana




--
View this message in context: http://apache-hbase.679495.n3.nabble.com/find-size-of-each-table-in-the-cluster-tp4078899p4078919.html
Sent from the HBase User mailing list archive at Nabble.com.

Re: find size of each table in the cluster

Posted by Tomasz Bem <to...@gmail.com>.
Hi,

this is standard convention for hortonworks distribution. First Three 
are HBase version, the last two are HDP version.

Cheers
Tomasz Bem.

On 2016-03-31 06:27, Ted Yu wrote:
> bq. hbase version is 1.1.1.2.3
>
> I don't think there was ever such a release - there should be only 3 dots.
>
> bq. /hbase is the default storage location for tables in hdfs
>
> the root dir is given by hbase.rootdir config parameter.
>
> Here is sample listing:
>
> http://pastebin.com/ekF4tsYn
>
> Under data, you would see:
>
> drwxr-xr-x   - hbase hdfs          0 2016-03-22 20:26
> /apps/hbase/data/data/default
> drwxr-xr-x   - hbase hdfs          0 2016-03-14 19:13
> /apps/hbase/data/data/hbase
>
> hbase is system namespace.
>
> Under default (or your own namespace), you would get table dir. Here is a
> sample:
>
> drwxr-xr-x   - hbase hdfs          0 2016-03-22 20:26
> /apps/hbase/data/data/default/elog_pn_split
>
> On Wed, Mar 30, 2016 at 7:26 PM, Stephen Durfey <sj...@gmail.com> wrote:
>
>> I believe the easiest way would be to run 'hadoop dfs -du -h /hbase'. I
>> believe /hbase is the default storage location for tables in hdfs. The size
>> will be either compressed or uncompressed, depending upon if compression is
>> enabled.
>>
>>
>>
>>
>>
>>
>> On Wed, Mar 30, 2016 at 6:32 PM -0700, "marjana" <mi...@us.ibm.com>
>> wrote:
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Hello,
>> I am new to hBase, so sorry if I am talking nonsense.
>>
>> I am trying to figure out a way how to find the total size of each table in
>> my hBase.
>> I have looked into hbase shell commands. There's "status 'detailed'", that
>> shows storefileSizeMB. If I were to add all of these grouped by tablename,
>> would that be the correct way to show MB used per table?
>> Is there any other (easier/cleaner) way?
>> hbase version is 1.1.1.2.3, HDFS: 2.7.1
>> Thanks
>> Marjana
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-hbase.679495.n3.nabble.com/find-size-of-each-table-in-the-cluster-tp4078899.html
>> Sent from the HBase User mailing list archive at Nabble.com.
>>
>>
>>
>>
>>
>>


Re: find size of each table in the cluster

Posted by Ted Yu <yu...@gmail.com>.
bq. hbase version is 1.1.1.2.3

I don't think there was ever such a release - there should be only 3 dots.

bq. /hbase is the default storage location for tables in hdfs

the root dir is given by hbase.rootdir config parameter.

Here is sample listing:

http://pastebin.com/ekF4tsYn

Under data, you would see:

drwxr-xr-x   - hbase hdfs          0 2016-03-22 20:26
/apps/hbase/data/data/default
drwxr-xr-x   - hbase hdfs          0 2016-03-14 19:13
/apps/hbase/data/data/hbase

hbase is system namespace.

Under default (or your own namespace), you would get table dir. Here is a
sample:

drwxr-xr-x   - hbase hdfs          0 2016-03-22 20:26
/apps/hbase/data/data/default/elog_pn_split

On Wed, Mar 30, 2016 at 7:26 PM, Stephen Durfey <sj...@gmail.com> wrote:

> I believe the easiest way would be to run 'hadoop dfs -du -h /hbase'. I
> believe /hbase is the default storage location for tables in hdfs. The size
> will be either compressed or uncompressed, depending upon if compression is
> enabled.
>
>
>
>
>
>
> On Wed, Mar 30, 2016 at 6:32 PM -0700, "marjana" <mi...@us.ibm.com>
> wrote:
>
>
>
>
>
>
>
>
>
>
> Hello,
> I am new to hBase, so sorry if I am talking nonsense.
>
> I am trying to figure out a way how to find the total size of each table in
> my hBase.
> I have looked into hbase shell commands. There's "status 'detailed'", that
> shows storefileSizeMB. If I were to add all of these grouped by tablename,
> would that be the correct way to show MB used per table?
> Is there any other (easier/cleaner) way?
> hbase version is 1.1.1.2.3, HDFS: 2.7.1
> Thanks
> Marjana
>
>
>
> --
> View this message in context:
> http://apache-hbase.679495.n3.nabble.com/find-size-of-each-table-in-the-cluster-tp4078899.html
> Sent from the HBase User mailing list archive at Nabble.com.
>
>
>
>
>
>

Re: find size of each table in the cluster

Posted by Stephen Durfey <sj...@gmail.com>.
I believe the easiest way would be to run 'hadoop dfs -du -h /hbase'. I believe /hbase is the default storage location for tables in hdfs. The size will be either compressed or uncompressed, depending upon if compression is enabled. 






On Wed, Mar 30, 2016 at 6:32 PM -0700, "marjana" <mi...@us.ibm.com> wrote:










Hello,
I am new to hBase, so sorry if I am talking nonsense.

I am trying to figure out a way how to find the total size of each table in
my hBase.
I have looked into hbase shell commands. There's "status 'detailed'", that
shows storefileSizeMB. If I were to add all of these grouped by tablename,
would that be the correct way to show MB used per table?
Is there any other (easier/cleaner) way?
hbase version is 1.1.1.2.3, HDFS: 2.7.1
Thanks
Marjana



--
View this message in context: http://apache-hbase.679495.n3.nabble.com/find-size-of-each-table-in-the-cluster-tp4078899.html
Sent from the HBase User mailing list archive at Nabble.com.