You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by "peter.marron@baesystems.com" <pe...@baesystems.com> on 2015/12/15 10:39:23 UTC

Table statistics

Hi,

I was wondering if there is any "recognized" way to obtain table statistics.
Ideally, given a Key range I would like to know the number of distinct rowids, entries and amount of data (in bytes) in that key range.
I assume that Accumulo holds at least some of this information internally, partly because I can see some of this
through the monitor, and partly because it must know something about the quantity of data held in order to be able
to implement the table threshold.

In my case the tables are very static and so the "estimates" that the monitor has are likely to sufficiently accurate for my purposes.

I have found this link
http://apache-accumulo.1065345.n5.nabble.com/Determining-tablets-assigned-to-table-splits-and-the-number-of-rows-in-each-tablet-td11546.html
which describes a process (which I haven't tried yet) to get the number of entries in a range.
Which would probably be sufficient for me and would certainly be a good start.
However it seems to be using internal data structures and non-published APIs, which is less than ideal.
And it seems to be written against Accumulo version 1.6.

I'm using Accumulo 1.7. Is there anything better than I can do or is it recommended that this is the way to go?

Regards,

Z
Please consider the environment before printing this email. This message should be regarded as confidential. If you have received this email in error please notify the sender and destroy it immediately. Statements of intent shall only become binding when confirmed in hard copy by an authorised signatory. The contents of this email may relate to dealings with other companies under the control of BAE Systems Applied Intelligence Limited, details of which can be found at http://www.baesystems.com/Businesses/index.htm.

FW: Table statistics

Posted by "peter.marron@baesystems.com" <pe...@baesystems.com>.
Sorry, wrong list.
Z

From: peter.marron@baesystems.com [mailto:peter.marron@baesystems.com]
Sent: 15 December 2015 09:39
To: user@hive.apache.org
Subject: Table statistics

Hi,

I was wondering if there is any "recognized" way to obtain table statistics.
Ideally, given a Key range I would like to know the number of distinct rowids, entries and amount of data (in bytes) in that key range.
I assume that Accumulo holds at least some of this information internally, partly because I can see some of this
through the monitor, and partly because it must know something about the quantity of data held in order to be able
to implement the table threshold.

In my case the tables are very static and so the "estimates" that the monitor has are likely to sufficiently accurate for my purposes.

I have found this link
http://apache-accumulo.1065345.n5.nabble.com/Determining-tablets-assigned-to-table-splits-and-the-number-of-rows-in-each-tablet-td11546.html
which describes a process (which I haven't tried yet) to get the number of entries in a range.
Which would probably be sufficient for me and would certainly be a good start.
However it seems to be using internal data structures and non-published APIs, which is less than ideal.
And it seems to be written against Accumulo version 1.6.

I'm using Accumulo 1.7. Is there anything better than I can do or is it recommended that this is the way to go?

Regards,

Z
Please consider the environment before printing this email. This message should be regarded as confidential. If you have received this email in error please notify the sender and destroy it immediately. Statements of intent shall only become binding when confirmed in hard copy by an authorised signatory. The contents of this email may relate to dealings with other companies under the control of BAE Systems Applied Intelligence Limited, details of which can be found at http://www.baesystems.com/Businesses/index.htm.
Please consider the environment before printing this email. This message should be regarded as confidential. If you have received this email in error please notify the sender and destroy it immediately. Statements of intent shall only become binding when confirmed in hard copy by an authorised signatory. The contents of this email may relate to dealings with other companies under the control of BAE Systems Applied Intelligence Limited, details of which can be found at http://www.baesystems.com/Businesses/index.htm.