You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Dalia Sobhy <da...@hotmail.com> on 2013/01/01 22:40:37 UTC
RE: Hbase Question

Dear yong,

How to 
distribute my data in the cluster ? Note that I am using cloudera manager 4.1

Thanks in advance:D

> Date: Fri, 28 Dec 2012 20:38:22 +0100
> Subject: Re: Hbase Question
> From: yongyong313@gmail.com
> To: user@hbase.apache.org
> 
> I think you can take a look at your row-key design and evenly
> distribute your data in your cluster, as you mentioned even if you
> added more nodes, there was no improvement of performance. Maybe you
> have a node who is a hot spot, and the other nodes have no work to do.
> 
> regards!
> 
> Yong
> 
> On Tue, Dec 25, 2012 at 3:31 AM, 周梦想 <ab...@gmail.com> wrote:
> > Hi Dalia,
> >
> > I think you can make a small sample of the table to do the test, then
> > you'll find what's the difference of scan and count.
> > because you can count it by human.
> >
> > Best regards,
> > Andy
> >
> > 2012/12/24 Dalia Sobhy <da...@hotmail.com>
> >
> >>
> >> Dear all,
> >>
> >> I have 50,000 row with diagnosis qualifier = "cardiac", and another 50,000
> >> rows with "renal".
> >>
> >> When I type this in Hbase shell,
> >>
> >> import org.apache.hadoop.hbase.filter.CompareFilter
> >> import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
> >> import org.apache.hadoop.hbase.filter.SubstringComparator
> >> import org.apache.hadoop.hbase.util.Bytes
> >>
> >> scan 'patient', { COLUMNS => "info:diagnosis", FILTER =>
> >>     SingleColumnValueFilter.new(Bytes.toBytes('info'),
> >>          Bytes.toBytes('diagnosis'),
> >>          CompareFilter::CompareOp.valueOf('EQUAL'),
> >>          SubstringComparator.new('cardiac'))}
> >>
> >> Output = 50,000 row
> >>
> >> import org.apache.hadoop.hbase.filter.CompareFilter
> >> import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
> >> import org.apache.hadoop.hbase.filter.SubstringComparator
> >> import org.apache.hadoop.hbase.util.Bytes
> >>
> >> count 'patient', { COLUMNS => "info:diagnosis", FILTER =>
> >>     SingleColumnValueFilter.new(Bytes.toBytes('info'),
> >>          Bytes.toBytes('diagnosis'),
> >>          CompareFilter::CompareOp.valueOf('EQUAL'),
> >>          SubstringComparator.new('cardiac'))}
> >> Output = 100,000 row
> >>
> >> Even though I tried it using Hbase Java API, Aggregation Client Instance,
> >> and I enabled the Coprocessor aggregation for the table.
> >> rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan)
> >>
> >> Also when measuring the improved performance on case of adding more nodes
> >> the operation takes the same time.
> >>
> >> So any advice please?
> >>
> >> I have been throughout all this mess from a couple of weeks
> >>
> >> Thanks,
> >>
> >>
> >>
> >>