You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by nicolas maillard <ni...@fifty-five.com> on 2012/12/11 15:50:55 UTC

Counter and Coprocessor Musing

Hi everyone

While working with hbase and looking at what the tables and meta look like I 
hava 
thought of a couple things, maybe someone has insights.
My thoughts are around the count situation it is a current database process to 
count entries for a given query.
for example as a first check to see if everything is written or sometimes to get 
a 
feel of a population.
I was wondering 2 things:
- Should'nt Hbase keep in the metrics for a table it's total entry count?
this would not take too much space and often comes in handy. Granted with a 
coprocessor you could easily create a table with counters for all the other 
tables in the system but it would be a nice have as a standard.

- I was also wondering maybe every region could know the number of entries it 
contains. Every region already knows the start and endkey of it's entries. For a 
count on a given scan this would speed up the count. Every region who's start 
and 
and endkey are in the scan would just send back it's population count and only a 
region that is wider then the count would need to be scanned and counted.

Wondering if these thoughts are already implemented and if I'm missing something 
or would not be a good idea. Altenratly if this is a not a definite No for some 
reason could coprocessors allow to implement these thoughts. Can I with a 
coprocessor write in the metrics part, or on a given scan first check if, for a 
region smaller than my scan, I already have written somewhere the count instead 
of 
scanning and couning.

Thnaks for any thoughts you may have

RE: Re:Re: Counter and Coprocessor Musing

Posted by Anoop Sam John <an...@huawei.com>.

Agree with Azury
Ted : He mentions some thing different than HBASE-5982.
If the count of the rows maintained in another meta table, then getting the rows count from that will be much faster than the AggregateImplementation getRowNum I think.

Specific to the use case some one can make this using the CP. But a generic implementation might be difficult. How we can handle the versioning. When a new version comes for an existing row, we should not increment this. Also to handle the TTLs..

-Anoop-
________________________________________
From: Azury [ziqidonglai1979@126.com]
Sent: Wednesday, December 12, 2012 9:40 AM
To: user@hbase.apache.org
Subject: Re:Re: Counter and Coprocessor Musing

Hi Ted,
I think he want to table 'meta data', not similar to Coprocessor.
such as long rows = table.rows();

just probably, not sure about that.



At 2012-12-12 01:11:49,"Ted Yu" <yu...@gmail.com> wrote:
>Thanks for sharing your thoughts.
>
>Which HBase version are you currently using ?
>Have you looked at AggregateImplementation which is included in hbase jar ?
>A count operation (getRowNum) is in AggregateImplementation.
>
>It would be nice if you can tell us how much difference (in terms of
>response time) this aggregation lags your expectation.
>
>Also take a look at HBASE-5982 HBase Coprocessor Local Aggregation
>
>Cheers
>
>On Tue, Dec 11, 2012 at 6:50 AM, nicolas maillard <
>nicolas.maillard@fifty-five.com> wrote:
>
>> Hi everyone
>>
>> While working with hbase and looking at what the tables and meta look like
>> I
>> hava
>> thought of a couple things, maybe someone has insights.
>> My thoughts are around the count situation it is a current database
>> process to
>> count entries for a given query.
>> for example as a first check to see if everything is written or sometimes
>> to get
>> a
>> feel of a population.
>> I was wondering 2 things:
>> - Should'nt Hbase keep in the metrics for a table it's total entry count?
>> this would not take too much space and often comes in handy. Granted with a
>> coprocessor you could easily create a table with counters for all the other
>> tables in the system but it would be a nice have as a standard.
>>
>> - I was also wondering maybe every region could know the number of entries
>> it
>> contains. Every region already knows the start and endkey of it's entries.
>> For a
>> count on a given scan this would speed up the count. Every region who's
>> start
>> and
>> and endkey are in the scan would just send back it's population count and
>> only a
>> region that is wider then the count would need to be scanned and counted.
>>
>> Wondering if these thoughts are already implemented and if I'm missing
>> something
>> or would not be a good idea. Altenratly if this is a not a definite No for
>> some
>> reason could coprocessors allow to implement these thoughts. Can I with a
>> coprocessor write in the metrics part, or on a given scan first check if,
>> for a
>> region smaller than my scan, I already have written somewhere the count
>> instead
>> of
>> scanning and couning.
>>
>> Thnaks for any thoughts you may have
>>
>>

Re:Re: Counter and Coprocessor Musing

Posted by Azury <zi...@126.com>.

Hi Ted,
I think he want to table 'meta data', not similar to Coprocessor.
such as long rows = table.rows();

just probably, not sure about that.



At 2012-12-12 01:11:49,"Ted Yu" <yu...@gmail.com> wrote:
>Thanks for sharing your thoughts.
>
>Which HBase version are you currently using ?
>Have you looked at AggregateImplementation which is included in hbase jar ?
>A count operation (getRowNum) is in AggregateImplementation.
>
>It would be nice if you can tell us how much difference (in terms of
>response time) this aggregation lags your expectation.
>
>Also take a look at HBASE-5982 HBase Coprocessor Local Aggregation
>
>Cheers
>
>On Tue, Dec 11, 2012 at 6:50 AM, nicolas maillard <
>nicolas.maillard@fifty-five.com> wrote:
>
>> Hi everyone
>>
>> While working with hbase and looking at what the tables and meta look like
>> I
>> hava
>> thought of a couple things, maybe someone has insights.
>> My thoughts are around the count situation it is a current database
>> process to
>> count entries for a given query.
>> for example as a first check to see if everything is written or sometimes
>> to get
>> a
>> feel of a population.
>> I was wondering 2 things:
>> - Should'nt Hbase keep in the metrics for a table it's total entry count?
>> this would not take too much space and often comes in handy. Granted with a
>> coprocessor you could easily create a table with counters for all the other
>> tables in the system but it would be a nice have as a standard.
>>
>> - I was also wondering maybe every region could know the number of entries
>> it
>> contains. Every region already knows the start and endkey of it's entries.
>> For a
>> count on a given scan this would speed up the count. Every region who's
>> start
>> and
>> and endkey are in the scan would just send back it's population count and
>> only a
>> region that is wider then the count would need to be scanned and counted.
>>
>> Wondering if these thoughts are already implemented and if I'm missing
>> something
>> or would not be a good idea. Altenratly if this is a not a definite No for
>> some
>> reason could coprocessors allow to implement these thoughts. Can I with a
>> coprocessor write in the metrics part, or on a given scan first check if,
>> for a
>> region smaller than my scan, I already have written somewhere the count
>> instead
>> of
>> scanning and couning.
>>
>> Thnaks for any thoughts you may have
>>
>>

Re: Counter and Coprocessor Musing

Posted by Ted Yu <yu...@gmail.com>.

Thanks for sharing your thoughts.

Which HBase version are you currently using ?
Have you looked at AggregateImplementation which is included in hbase jar ?
A count operation (getRowNum) is in AggregateImplementation.

It would be nice if you can tell us how much difference (in terms of
response time) this aggregation lags your expectation.

Also take a look at HBASE-5982 HBase Coprocessor Local Aggregation

Cheers

On Tue, Dec 11, 2012 at 6:50 AM, nicolas maillard <
nicolas.maillard@fifty-five.com> wrote:

> Hi everyone
>
> While working with hbase and looking at what the tables and meta look like
> I
> hava
> thought of a couple things, maybe someone has insights.
> My thoughts are around the count situation it is a current database
> process to
> count entries for a given query.
> for example as a first check to see if everything is written or sometimes
> to get
> a
> feel of a population.
> I was wondering 2 things:
> - Should'nt Hbase keep in the metrics for a table it's total entry count?
> this would not take too much space and often comes in handy. Granted with a
> coprocessor you could easily create a table with counters for all the other
> tables in the system but it would be a nice have as a standard.
>
> - I was also wondering maybe every region could know the number of entries
> it
> contains. Every region already knows the start and endkey of it's entries.
> For a
> count on a given scan this would speed up the count. Every region who's
> start
> and
> and endkey are in the scan would just send back it's population count and
> only a
> region that is wider then the count would need to be scanned and counted.
>
> Wondering if these thoughts are already implemented and if I'm missing
> something
> or would not be a good idea. Altenratly if this is a not a definite No for
> some
> reason could coprocessors allow to implement these thoughts. Can I with a
> coprocessor write in the metrics part, or on a given scan first check if,
> for a
> region smaller than my scan, I already have written somewhere the count
> instead
> of
> scanning and couning.
>
> Thnaks for any thoughts you may have
>
>