You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Dalia Sobhy <da...@hotmail.com> on 2012/12/24 00:26:35 UTC

Hbase Question

Dear all,

I have 50,000 row with diagnosis qualifier = "cardiac", and another 50,000 rows with "renal".

When I type this in Hbase shell,

import org.apache.hadoop.hbase.filter.CompareFilter
import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
import org.apache.hadoop.hbase.filter.SubstringComparator
import org.apache.hadoop.hbase.util.Bytes

scan 'patient', { COLUMNS => "info:diagnosis", FILTER =>
    SingleColumnValueFilter.new(Bytes.toBytes('info'),
         Bytes.toBytes('diagnosis'),
         CompareFilter::CompareOp.valueOf('EQUAL'),
         SubstringComparator.new('cardiac'))}

Output = 50,000 row

import org.apache.hadoop.hbase.filter.CompareFilter
import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
import org.apache.hadoop.hbase.filter.SubstringComparator
import org.apache.hadoop.hbase.util.Bytes

count 'patient', { COLUMNS => "info:diagnosis", FILTER =>
    SingleColumnValueFilter.new(Bytes.toBytes('info'),
         Bytes.toBytes('diagnosis'),
         CompareFilter::CompareOp.valueOf('EQUAL'),
         SubstringComparator.new('cardiac'))}
Output = 100,000 row

Even though I tried it using Hbase Java API, Aggregation Client Instance, and I enabled the Coprocessor aggregation for the table.
rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan)

Also when measuring the improved performance on case of adding more nodes the operation takes the same time.

So any advice please?

I have been throughout all this mess from a couple of weeks

Thanks,

RE: Hbase Question

Posted by Dalia Sobhy <da...@hotmail.com>.

Dear yong,

How to 
distribute my data in the cluster ? Note that I am using cloudera manager 4.1

Thanks in advance:D

> Date: Fri, 28 Dec 2012 20:38:22 +0100
> Subject: Re: Hbase Question
> From: yongyong313@gmail.com
> To: user@hbase.apache.org
> 
> I think you can take a look at your row-key design and evenly
> distribute your data in your cluster, as you mentioned even if you
> added more nodes, there was no improvement of performance. Maybe you
> have a node who is a hot spot, and the other nodes have no work to do.
> 
> regards!
> 
> Yong
> 
> On Tue, Dec 25, 2012 at 3:31 AM, 周梦想 <ab...@gmail.com> wrote:
> > Hi Dalia,
> >
> > I think you can make a small sample of the table to do the test, then
> > you'll find what's the difference of scan and count.
> > because you can count it by human.
> >
> > Best regards,
> > Andy
> >
> > 2012/12/24 Dalia Sobhy <da...@hotmail.com>
> >
> >>
> >> Dear all,
> >>
> >> I have 50,000 row with diagnosis qualifier = "cardiac", and another 50,000
> >> rows with "renal".
> >>
> >> When I type this in Hbase shell,
> >>
> >> import org.apache.hadoop.hbase.filter.CompareFilter
> >> import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
> >> import org.apache.hadoop.hbase.filter.SubstringComparator
> >> import org.apache.hadoop.hbase.util.Bytes
> >>
> >> scan 'patient', { COLUMNS => "info:diagnosis", FILTER =>
> >>     SingleColumnValueFilter.new(Bytes.toBytes('info'),
> >>          Bytes.toBytes('diagnosis'),
> >>          CompareFilter::CompareOp.valueOf('EQUAL'),
> >>          SubstringComparator.new('cardiac'))}
> >>
> >> Output = 50,000 row
> >>
> >> import org.apache.hadoop.hbase.filter.CompareFilter
> >> import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
> >> import org.apache.hadoop.hbase.filter.SubstringComparator
> >> import org.apache.hadoop.hbase.util.Bytes
> >>
> >> count 'patient', { COLUMNS => "info:diagnosis", FILTER =>
> >>     SingleColumnValueFilter.new(Bytes.toBytes('info'),
> >>          Bytes.toBytes('diagnosis'),
> >>          CompareFilter::CompareOp.valueOf('EQUAL'),
> >>          SubstringComparator.new('cardiac'))}
> >> Output = 100,000 row
> >>
> >> Even though I tried it using Hbase Java API, Aggregation Client Instance,
> >> and I enabled the Coprocessor aggregation for the table.
> >> rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan)
> >>
> >> Also when measuring the improved performance on case of adding more nodes
> >> the operation takes the same time.
> >>
> >> So any advice please?
> >>
> >> I have been throughout all this mess from a couple of weeks
> >>
> >> Thanks,
> >>
> >>
> >>
> >>

Re: Hbase Question

Posted by yonghu <yo...@gmail.com>.

I think you can take a look at your row-key design and evenly
distribute your data in your cluster, as you mentioned even if you
added more nodes, there was no improvement of performance. Maybe you
have a node who is a hot spot, and the other nodes have no work to do.

regards!

Yong

On Tue, Dec 25, 2012 at 3:31 AM, 周梦想 <ab...@gmail.com> wrote:
> Hi Dalia,
>
> I think you can make a small sample of the table to do the test, then
> you'll find what's the difference of scan and count.
> because you can count it by human.
>
> Best regards,
> Andy
>
> 2012/12/24 Dalia Sobhy <da...@hotmail.com>
>
>>
>> Dear all,
>>
>> I have 50,000 row with diagnosis qualifier = "cardiac", and another 50,000
>> rows with "renal".
>>
>> When I type this in Hbase shell,
>>
>> import org.apache.hadoop.hbase.filter.CompareFilter
>> import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
>> import org.apache.hadoop.hbase.filter.SubstringComparator
>> import org.apache.hadoop.hbase.util.Bytes
>>
>> scan 'patient', { COLUMNS => "info:diagnosis", FILTER =>
>>     SingleColumnValueFilter.new(Bytes.toBytes('info'),
>>          Bytes.toBytes('diagnosis'),
>>          CompareFilter::CompareOp.valueOf('EQUAL'),
>>          SubstringComparator.new('cardiac'))}
>>
>> Output = 50,000 row
>>
>> import org.apache.hadoop.hbase.filter.CompareFilter
>> import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
>> import org.apache.hadoop.hbase.filter.SubstringComparator
>> import org.apache.hadoop.hbase.util.Bytes
>>
>> count 'patient', { COLUMNS => "info:diagnosis", FILTER =>
>>     SingleColumnValueFilter.new(Bytes.toBytes('info'),
>>          Bytes.toBytes('diagnosis'),
>>          CompareFilter::CompareOp.valueOf('EQUAL'),
>>          SubstringComparator.new('cardiac'))}
>> Output = 100,000 row
>>
>> Even though I tried it using Hbase Java API, Aggregation Client Instance,
>> and I enabled the Coprocessor aggregation for the table.
>> rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan)
>>
>> Also when measuring the improved performance on case of adding more nodes
>> the operation takes the same time.
>>
>> So any advice please?
>>
>> I have been throughout all this mess from a couple of weeks
>>
>> Thanks,
>>
>>
>>
>>

Re: Hbase Question

Posted by 周梦想 <ab...@gmail.com>.

Hi Dalia,

I think you can make a small sample of the table to do the test, then
you'll find what's the difference of scan and count.
because you can count it by human.

Best regards,
Andy

2012/12/24 Dalia Sobhy <da...@hotmail.com>

>
> Dear all,
>
> I have 50,000 row with diagnosis qualifier = "cardiac", and another 50,000
> rows with "renal".
>
> When I type this in Hbase shell,
>
> import org.apache.hadoop.hbase.filter.CompareFilter
> import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
> import org.apache.hadoop.hbase.filter.SubstringComparator
> import org.apache.hadoop.hbase.util.Bytes
>
> scan 'patient', { COLUMNS => "info:diagnosis", FILTER =>
>     SingleColumnValueFilter.new(Bytes.toBytes('info'),
>          Bytes.toBytes('diagnosis'),
>          CompareFilter::CompareOp.valueOf('EQUAL'),
>          SubstringComparator.new('cardiac'))}
>
> Output = 50,000 row
>
> import org.apache.hadoop.hbase.filter.CompareFilter
> import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
> import org.apache.hadoop.hbase.filter.SubstringComparator
> import org.apache.hadoop.hbase.util.Bytes
>
> count 'patient', { COLUMNS => "info:diagnosis", FILTER =>
>     SingleColumnValueFilter.new(Bytes.toBytes('info'),
>          Bytes.toBytes('diagnosis'),
>          CompareFilter::CompareOp.valueOf('EQUAL'),
>          SubstringComparator.new('cardiac'))}
> Output = 100,000 row
>
> Even though I tried it using Hbase Java API, Aggregation Client Instance,
> and I enabled the Coprocessor aggregation for the table.
> rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan)
>
> Also when measuring the improved performance on case of adding more nodes
> the operation takes the same time.
>
> So any advice please?
>
> I have been throughout all this mess from a couple of weeks
>
> Thanks,
>
>
>
>