You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Dalia Sobhy <da...@hotmail.com> on 2012/12/24 16:03:20 UTC

Hbase Count Aggregate Function

Dear all,
 
I have 50,000 row with diagnosis qualifier = "cardiac", and another 50,000 rows with "renal".
 
When I type this in Hbase shell,
 
import org.apache.hadoop.hbase.filter.CompareFilter
import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
import org.apache.hadoop.hbase.filter.SubstringComparator
import org.apache.hadoop.hbase.util.Bytes
 
scan 'patient', { COLUMNS => "info:diagnosis", FILTER =>
    SingleColumnValueFilter.new(Bytes.toBytes('info'),
         Bytes.toBytes('diagnosis'),
         CompareFilter::CompareOp.valueOf('EQUAL'),
         SubstringComparator.new('cardiac'))}
 
Output = 50,000 row
 
import org.apache.hadoop.hbase.filter.CompareFilter
import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
import org.apache.hadoop.hbase.filter.SubstringComparator
import org.apache.hadoop.hbase.util.Bytes
 
count 'patient', { COLUMNS => "info:diagnosis", FILTER =>
    SingleColumnValueFilter.new(Bytes.toBytes('info'),
         Bytes.toBytes('diagnosis'),
         CompareFilter::CompareOp.valueOf('EQUAL'),
         SubstringComparator.new('cardiac'))}
Output = 100,000 row
 
Even though I tried it using Hbase Java API, Aggregation Client Instance, and I enabled the Coprocessor aggregation for the table.
rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan)
 
Also when measuring the improved performance on case of adding more nodes the operation takes the same time.
 
So any advice please?
 
I have been throughout all this mess from a couple of weeks
 
Thanks,

Re: Hbase Count Aggregate Function

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.

Hi Dalia,

You already sent the same question yesterday ;) Just give some time to
people to look at it.

JM

2012/12/24, Dalia Sobhy <da...@hotmail.com>:
>
> Dear all,
>
> I have 50,000 row with diagnosis qualifier = "cardiac", and another 50,000
> rows with "renal".
>
> When I type this in Hbase shell,
>
> import org.apache.hadoop.hbase.filter.CompareFilter
> import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
> import org.apache.hadoop.hbase.filter.SubstringComparator
> import org.apache.hadoop.hbase.util.Bytes
>
> scan 'patient', { COLUMNS => "info:diagnosis", FILTER =>
>     SingleColumnValueFilter.new(Bytes.toBytes('info'),
>          Bytes.toBytes('diagnosis'),
>          CompareFilter::CompareOp.valueOf('EQUAL'),
>          SubstringComparator.new('cardiac'))}
>
> Output = 50,000 row
>
> import org.apache.hadoop.hbase.filter.CompareFilter
> import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
> import org.apache.hadoop.hbase.filter.SubstringComparator
> import org.apache.hadoop.hbase.util.Bytes
>
> count 'patient', { COLUMNS => "info:diagnosis", FILTER =>
>     SingleColumnValueFilter.new(Bytes.toBytes('info'),
>          Bytes.toBytes('diagnosis'),
>          CompareFilter::CompareOp.valueOf('EQUAL'),
>          SubstringComparator.new('cardiac'))}
> Output = 100,000 row
>
> Even though I tried it using Hbase Java API, Aggregation Client Instance,
> and I enabled the Coprocessor aggregation for the table.
> rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan)
>
> Also when measuring the improved performance on case of adding more nodes
> the operation takes the same time.
>
> So any advice please?
>
> I have been throughout all this mess from a couple of weeks
>
> Thanks,

Re: Hbase Count Aggregate Function

Posted by ramkrishna vasudevan <ra...@gmail.com>.

Oh...Oops..

Regards
Ram

On Wed, Jan 2, 2013 at 3:14 AM, Dalia Sobhy <da...@hotmail.com>wrote:

>
> Thanks Ram,
>
> Issue is resolved i forgot to add
> scan.addFilter(fliterlist);
>
> Thats why it was not filtering !!!
>
> > Date: Wed, 26 Dec 2012 21:11:32 +0530
> > Subject: Re: Hbase Count Aggregate Function
> > From: ramkrishna.s.vasudevan@gmail.com
> > To: user@hbase.apache.org
> >
> > Dalia,
> >
> > I tried out this eg,
> >
> > {code}
> >   private static final byte[] TEST_TABLE = Bytes.toBytes("TestTable");
> >   private static final byte[] TEST_FAMILY = Bytes.toBytes("TestFamily");
> >   private static final byte[] TEST_QUALIFIER =
> > Bytes.toBytes("TestQualifier");
> >   private static final byte[] TEST_MULTI_CQ =
> Bytes.toBytes("TestMultiCQ");
> >
> >   private static byte[] ROW = Bytes.toBytes("testRow");
> >   private static final int ROWSIZE = 20;
> >   private static final int rowSeperator1 = 5;
> >   private static final int rowSeperator2 = 12;
> >   private static byte[][] ROWS = makeN(ROW, ROWSIZE);
> > for (int i = 0; i < ROWSIZE; i++) {
> >       Put put = new Put(ROWS[i]);
> >       put.setWriteToWAL(false);
> >       Long l = new Long(i);
> >       put.add(TEST_FAMILY, TEST_QUALIFIER, Bytes.toBytes(l));
> >       table.put(put);
> >       Put p2 = new Put(ROWS[i]);
> >       put.setWriteToWAL(false);
> >       p2.add(TEST_FAMILY, Bytes.add(TEST_MULTI_CQ, Bytes.toBytes(l)),
> Bytes
> >           .toBytes(l * 10));
> >       table.put(p2);
> >
> >    AggregationClient aClient = new AggregationClient(conf);
> >     Scan scan = new Scan();
> >     scan.addColumn(TEST_FAMILY, TEST_QUALIFIER);
> >     final ColumnInterpreter<Long, Long> ci = new LongColumnInterpreter();
> >     SingleColumnValueFilter scvf = new
> SingleColumnValueFilter(TEST_FAMILY,
> > TEST_QUALIFIER, CompareOp.EQUAL,
> >           Bytes.toBytes(4l));
> >     scan.setFilter(scvf);
> >     long rowCount = aClient.rowCount(TEST_TABLE, ci,
> >         scan);
> >     assertEquals(ROWSIZE, rowCount);
> >     }
> > {code}
> >
> > So this assertion is failing and it is working as expected.  If you want
> to
> > try out check out the testcase
> > in TestAggregateProtocol.testRowCountAllTable().
> > Just modify the testcase so that you pass a SingleColumnValueFilter.  It
> is
> > working fine.
> >
> > Please check and let me know.  May be am doing some mistake.
> >
> > Regards
> > Ram
> >
> > On Tue, Dec 25, 2012 at 11:25 PM, Dalia Sobhy <
> dalia.mohsobhy@hotmail.com>wrote:
> >
> > >
> > > Is there a problem in letting ID (rowkey) "int" value??
> > >
> > > > Date: Tue, 25 Dec 2012 22:44:00 +0530
> > > > Subject: Re: Hbase Count Aggregate Function
> > > > From: ramkrishna.s.vasudevan@gmail.com
> > > > To: user@hbase.apache.org
> > > >
> > > > @Dalia
> > > >
> > > > I think the aggregation client should work with what you have passed.
> > >  What
> > > > i meant in the previous mail was with table.count() and now with
> > > > AggregationClient.
> > > > {code}
> > > > if (scan.getFilter() == null && qualifier == null)
> > > >       scan.setFilter(new FirstKeyOnlyFilter());
> > > > {code}
> > > >
> > > > So as you have passed the filter then it should work as how the SCVF
> > > should
> > > > work.  I can check this out during free time (may be tomorrow).
> > > > If not you can raise a bug.  If it turns to be fine then we can
> close it
> > > > out otherwise its better we fix it.
> > > > I can understand your urgency in this.
> > > >
> > > > Regards
> > > > Ram
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Tue, Dec 25, 2012 at 10:27 PM, <yu...@gmail.com> wrote:
> > > >
> > > > > RowCount method accepts scan object where you can attach your
> custom
> > > > > filter.
> > > > >
> > > > > Cheers
> > > > >
> > > > >
> > > > >
> > > > > On Dec 25, 2012, at 8:42 AM, Dalia Sobhy <
> dalia.mohsobhy@hotmail.com>
> > > > > wrote:
> > > > >
> > > > > >
> > > > > > Do you mean I implement a new rowCount method in Aggregation
> Client
> > > > > Class.
> > > > > >
> > > > > > I cannot understand, could u illustrate with a code sample Ram?
> > > > > >
> > > > > >>> Date: Tue, 25 Dec 2012 00:21:14 +0530
> > > > > >>> Subject: Re: Hbase Count Aggregate Function
> > > > > >>> From: ramkrishna.s.vasudevan@gmail.com
> > > > > >>> To: user@hbase.apache.org
> > > > > >>>
> > > > > >>> Hi
> > > > > >>> You could have custom filter implemented which is similar to
> > > > > >>> FirstKeyOnlyfilter.
> > > > > >>> Implement the filterKeyValue method such that it should match
> your
> > > > > keyvalue
> > > > > >>> (the specific qualifier that you are looking for).
> > > > > >>>
> > > > > >>> Deploy it in your cluster.  It should work.
> > > > > >>>
> > > > > >>> Regards
> > > > > >>> Ram
> > > > > >>>
> > > > > >>> On Mon, Dec 24, 2012 at 10:35 PM, Dalia Sobhy <
> > > > > dalia.mohsobhy@hotmail.com>wrote:
> > > > > >>>
> > > > > >>>>
> > > > > >>>> So do you have a suggestion how to enable/work the filter?
> > > > > >>>>
> > > > > >>>>> Date: Mon, 24 Dec 2012 22:22:49 +0530
> > > > > >>>>> Subject: Re: Hbase Count Aggregate Function
> > > > > >>>>> From: ramkrishna.s.vasudevan@gmail.com
> > > > > >>>>> To: user@hbase.apache.org
> > > > > >>>>>
> > > > > >>>>> Okie, seeing the shell script and the code I feel that while
> you
> > > use
> > > > > this
> > > > > >>>>> counter, the user's filter is not taken into account.
> > > > > >>>>> It adds a FirstKeyOnlyFilter and proceeds with the scan. :(.
> > > > > >>>>>
> > > > > >>>>> Regards
> > > > > >>>>> Ram
> > > > > >>>>>
> > > > > >>>>> On Mon, Dec 24, 2012 at 10:11 PM, Dalia Sobhy <
> > > > > >>>> dalia.mohsobhy@hotmail.com>wrote:
> > > > > >>>>>
> > > > > >>>>>>
> > > > > >>>>>> yeah scan gives the correct number of rows, while count
> returns
> > > the
> > > > > >>>> total
> > > > > >>>>>> number of rows.
> > > > > >>>>>>
> > > > > >>>>>> Both are using the same filter, I even tried it using Java
> API,
> > > > > using
> > > > > >>>> row
> > > > > >>>>>> count method.
> > > > > >>>>>>
> > > > > >>>>>> rowCount = aggregationClient.rowCount(TABLE_NAME, null,
> scan);
> > > > > >>>>>>
> > > > > >>>>>> I get the total number of rows not the number of rows
> filtered.
> > > > > >>>>>>
> > > > > >>>>>> So any idea ??
> > > > > >>>>>>
> > > > > >>>>>> Thanks Ram :)
> > > > > >>>>>>
> > > > > >>>>>>> Date: Mon, 24 Dec 2012 21:57:54 +0530
> > > > > >>>>>>> Subject: Re: Hbase Count Aggregate Function
> > > > > >>>>>>> From: ramkrishna.s.vasudevan@gmail.com
> > > > > >>>>>>> To: user@hbase.apache.org
> > > > > >>>>>>>
> > > > > >>>>>>> So you find that scan with a filter and count with the same
> > > filter
> > > > > is
> > > > > >>>>>>> giving you different results?
> > > > > >>>>>>>
> > > > > >>>>>>> Regards
> > > > > >>>>>>> Ram
> > > > > >>>>>>>
> > > > > >>>>>>> On Mon, Dec 24, 2012 at 8:33 PM, Dalia Sobhy <
> > > > > >>>> dalia.mohsobhy@hotmail.com
> > > > > >>>>>>> wrote:
> > > > > >>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>> Dear all,
> > > > > >>>>>>>>
> > > > > >>>>>>>> I have 50,000 row with diagnosis qualifier = "cardiac",
> and
> > > > > another
> > > > > >>>>>> 50,000
> > > > > >>>>>>>> rows with "renal".
> > > > > >>>>>>>>
> > > > > >>>>>>>> When I type this in Hbase shell,
> > > > > >>>>>>>>
> > > > > >>>>>>>> import org.apache.hadoop.hbase.filter.CompareFilter
> > > > > >>>>>>>> import
> org.apache.hadoop.hbase.filter.SingleColumnValueFilter
> > > > > >>>>>>>> import org.apache.hadoop.hbase.filter.SubstringComparator
> > > > > >>>>>>>> import org.apache.hadoop.hbase.util.Bytes
> > > > > >>>>>>>>
> > > > > >>>>>>>> scan 'patient', { COLUMNS => "info:diagnosis", FILTER =>
> > > > > >>>>>>>>    SingleColumnValueFilter.new(Bytes.toBytes('info'),
> > > > > >>>>>>>>         Bytes.toBytes('diagnosis'),
> > > > > >>>>>>>>         CompareFilter::CompareOp.valueOf('EQUAL'),
> > > > > >>>>>>>>         SubstringComparator.new('cardiac'))}
> > > > > >>>>>>>>
> > > > > >>>>>>>> Output = 50,000 row
> > > > > >>>>>>>>
> > > > > >>>>>>>> import org.apache.hadoop.hbase.filter.CompareFilter
> > > > > >>>>>>>> import
> org.apache.hadoop.hbase.filter.SingleColumnValueFilter
> > > > > >>>>>>>> import org.apache.hadoop.hbase.filter.SubstringComparator
> > > > > >>>>>>>> import org.apache.hadoop.hbase.util.Bytes
> > > > > >>>>>>>>
> > > > > >>>>>>>> count 'patient', { COLUMNS => "info:diagnosis", FILTER =>
> > > > > >>>>>>>>    SingleColumnValueFilter.new(Bytes.toBytes('info'),
> > > > > >>>>>>>>         Bytes.toBytes('diagnosis'),
> > > > > >>>>>>>>         CompareFilter::CompareOp.valueOf('EQUAL'),
> > > > > >>>>>>>>         SubstringComparator.new('cardiac'))}
> > > > > >>>>>>>> Output = 100,000 row
> > > > > >>>>>>>>
> > > > > >>>>>>>> Even though I tried it using Hbase Java API, Aggregation
> > > Client
> > > > > >>>>>> Instance,
> > > > > >>>>>>>> and I enabled the Coprocessor aggregation for the table.
> > > > > >>>>>>>> rowCount = aggregationClient.rowCount(TABLE_NAME, null,
> scan)
> > > > > >>>>>>>>
> > > > > >>>>>>>> Also when measuring the improved performance on case of
> adding
> > > > > more
> > > > > >>>>>> nodes
> > > > > >>>>>>>> the operation takes the same time.
> > > > > >>>>>>>>
> > > > > >>>>>>>> So any advice please?
> > > > > >>>>>>>>
> > > > > >>>>>>>> I have been throughout all this mess from a couple of
> weeks
> > > > > >>>>>>>>
> > > > > >>>>>>>> Thanks,
> > > > > >>>>>>
> > > > > >>>>>>
> > > > > >>>>
> > > > > >>>>
> > > > > >>
> > > > > >
> > > > >
> > >
> > >
>
>

RE: Hbase Count Aggregate Function

Posted by Dalia Sobhy <da...@hotmail.com>.

Thanks Ram,

Issue is resolved i forgot to add
scan.addFilter(fliterlist);

Thats why it was not filtering !!!

> Date: Wed, 26 Dec 2012 21:11:32 +0530
> Subject: Re: Hbase Count Aggregate Function
> From: ramkrishna.s.vasudevan@gmail.com
> To: user@hbase.apache.org
> 
> Dalia,
> 
> I tried out this eg,
> 
> {code}
>   private static final byte[] TEST_TABLE = Bytes.toBytes("TestTable");
>   private static final byte[] TEST_FAMILY = Bytes.toBytes("TestFamily");
>   private static final byte[] TEST_QUALIFIER =
> Bytes.toBytes("TestQualifier");
>   private static final byte[] TEST_MULTI_CQ = Bytes.toBytes("TestMultiCQ");
> 
>   private static byte[] ROW = Bytes.toBytes("testRow");
>   private static final int ROWSIZE = 20;
>   private static final int rowSeperator1 = 5;
>   private static final int rowSeperator2 = 12;
>   private static byte[][] ROWS = makeN(ROW, ROWSIZE);
> for (int i = 0; i < ROWSIZE; i++) {
>       Put put = new Put(ROWS[i]);
>       put.setWriteToWAL(false);
>       Long l = new Long(i);
>       put.add(TEST_FAMILY, TEST_QUALIFIER, Bytes.toBytes(l));
>       table.put(put);
>       Put p2 = new Put(ROWS[i]);
>       put.setWriteToWAL(false);
>       p2.add(TEST_FAMILY, Bytes.add(TEST_MULTI_CQ, Bytes.toBytes(l)), Bytes
>           .toBytes(l * 10));
>       table.put(p2);
> 
>    AggregationClient aClient = new AggregationClient(conf);
>     Scan scan = new Scan();
>     scan.addColumn(TEST_FAMILY, TEST_QUALIFIER);
>     final ColumnInterpreter<Long, Long> ci = new LongColumnInterpreter();
>     SingleColumnValueFilter scvf = new SingleColumnValueFilter(TEST_FAMILY,
> TEST_QUALIFIER, CompareOp.EQUAL,
>           Bytes.toBytes(4l));
>     scan.setFilter(scvf);
>     long rowCount = aClient.rowCount(TEST_TABLE, ci,
>         scan);
>     assertEquals(ROWSIZE, rowCount);
>     }
> {code}
> 
> So this assertion is failing and it is working as expected.  If you want to
> try out check out the testcase
> in TestAggregateProtocol.testRowCountAllTable().
> Just modify the testcase so that you pass a SingleColumnValueFilter.  It is
> working fine.
> 
> Please check and let me know.  May be am doing some mistake.
> 
> Regards
> Ram
> 
> On Tue, Dec 25, 2012 at 11:25 PM, Dalia Sobhy <da...@hotmail.com>wrote:
> 
> >
> > Is there a problem in letting ID (rowkey) "int" value??
> >
> > > Date: Tue, 25 Dec 2012 22:44:00 +0530
> > > Subject: Re: Hbase Count Aggregate Function
> > > From: ramkrishna.s.vasudevan@gmail.com
> > > To: user@hbase.apache.org
> > >
> > > @Dalia
> > >
> > > I think the aggregation client should work with what you have passed.
> >  What
> > > i meant in the previous mail was with table.count() and now with
> > > AggregationClient.
> > > {code}
> > > if (scan.getFilter() == null && qualifier == null)
> > >       scan.setFilter(new FirstKeyOnlyFilter());
> > > {code}
> > >
> > > So as you have passed the filter then it should work as how the SCVF
> > should
> > > work.  I can check this out during free time (may be tomorrow).
> > > If not you can raise a bug.  If it turns to be fine then we can close it
> > > out otherwise its better we fix it.
> > > I can understand your urgency in this.
> > >
> > > Regards
> > > Ram
> > >
> > >
> > >
> > >
> > >
> > > On Tue, Dec 25, 2012 at 10:27 PM, <yu...@gmail.com> wrote:
> > >
> > > > RowCount method accepts scan object where you can attach your custom
> > > > filter.
> > > >
> > > > Cheers
> > > >
> > > >
> > > >
> > > > On Dec 25, 2012, at 8:42 AM, Dalia Sobhy <da...@hotmail.com>
> > > > wrote:
> > > >
> > > > >
> > > > > Do you mean I implement a new rowCount method in Aggregation Client
> > > > Class.
> > > > >
> > > > > I cannot understand, could u illustrate with a code sample Ram?
> > > > >
> > > > >>> Date: Tue, 25 Dec 2012 00:21:14 +0530
> > > > >>> Subject: Re: Hbase Count Aggregate Function
> > > > >>> From: ramkrishna.s.vasudevan@gmail.com
> > > > >>> To: user@hbase.apache.org
> > > > >>>
> > > > >>> Hi
> > > > >>> You could have custom filter implemented which is similar to
> > > > >>> FirstKeyOnlyfilter.
> > > > >>> Implement the filterKeyValue method such that it should match your
> > > > keyvalue
> > > > >>> (the specific qualifier that you are looking for).
> > > > >>>
> > > > >>> Deploy it in your cluster.  It should work.
> > > > >>>
> > > > >>> Regards
> > > > >>> Ram
> > > > >>>
> > > > >>> On Mon, Dec 24, 2012 at 10:35 PM, Dalia Sobhy <
> > > > dalia.mohsobhy@hotmail.com>wrote:
> > > > >>>
> > > > >>>>
> > > > >>>> So do you have a suggestion how to enable/work the filter?
> > > > >>>>
> > > > >>>>> Date: Mon, 24 Dec 2012 22:22:49 +0530
> > > > >>>>> Subject: Re: Hbase Count Aggregate Function
> > > > >>>>> From: ramkrishna.s.vasudevan@gmail.com
> > > > >>>>> To: user@hbase.apache.org
> > > > >>>>>
> > > > >>>>> Okie, seeing the shell script and the code I feel that while you
> > use
> > > > this
> > > > >>>>> counter, the user's filter is not taken into account.
> > > > >>>>> It adds a FirstKeyOnlyFilter and proceeds with the scan. :(.
> > > > >>>>>
> > > > >>>>> Regards
> > > > >>>>> Ram
> > > > >>>>>
> > > > >>>>> On Mon, Dec 24, 2012 at 10:11 PM, Dalia Sobhy <
> > > > >>>> dalia.mohsobhy@hotmail.com>wrote:
> > > > >>>>>
> > > > >>>>>>
> > > > >>>>>> yeah scan gives the correct number of rows, while count returns
> > the
> > > > >>>> total
> > > > >>>>>> number of rows.
> > > > >>>>>>
> > > > >>>>>> Both are using the same filter, I even tried it using Java API,
> > > > using
> > > > >>>> row
> > > > >>>>>> count method.
> > > > >>>>>>
> > > > >>>>>> rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan);
> > > > >>>>>>
> > > > >>>>>> I get the total number of rows not the number of rows filtered.
> > > > >>>>>>
> > > > >>>>>> So any idea ??
> > > > >>>>>>
> > > > >>>>>> Thanks Ram :)
> > > > >>>>>>
> > > > >>>>>>> Date: Mon, 24 Dec 2012 21:57:54 +0530
> > > > >>>>>>> Subject: Re: Hbase Count Aggregate Function
> > > > >>>>>>> From: ramkrishna.s.vasudevan@gmail.com
> > > > >>>>>>> To: user@hbase.apache.org
> > > > >>>>>>>
> > > > >>>>>>> So you find that scan with a filter and count with the same
> > filter
> > > > is
> > > > >>>>>>> giving you different results?
> > > > >>>>>>>
> > > > >>>>>>> Regards
> > > > >>>>>>> Ram
> > > > >>>>>>>
> > > > >>>>>>> On Mon, Dec 24, 2012 at 8:33 PM, Dalia Sobhy <
> > > > >>>> dalia.mohsobhy@hotmail.com
> > > > >>>>>>> wrote:
> > > > >>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>> Dear all,
> > > > >>>>>>>>
> > > > >>>>>>>> I have 50,000 row with diagnosis qualifier = "cardiac", and
> > > > another
> > > > >>>>>> 50,000
> > > > >>>>>>>> rows with "renal".
> > > > >>>>>>>>
> > > > >>>>>>>> When I type this in Hbase shell,
> > > > >>>>>>>>
> > > > >>>>>>>> import org.apache.hadoop.hbase.filter.CompareFilter
> > > > >>>>>>>> import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
> > > > >>>>>>>> import org.apache.hadoop.hbase.filter.SubstringComparator
> > > > >>>>>>>> import org.apache.hadoop.hbase.util.Bytes
> > > > >>>>>>>>
> > > > >>>>>>>> scan 'patient', { COLUMNS => "info:diagnosis", FILTER =>
> > > > >>>>>>>>    SingleColumnValueFilter.new(Bytes.toBytes('info'),
> > > > >>>>>>>>         Bytes.toBytes('diagnosis'),
> > > > >>>>>>>>         CompareFilter::CompareOp.valueOf('EQUAL'),
> > > > >>>>>>>>         SubstringComparator.new('cardiac'))}
> > > > >>>>>>>>
> > > > >>>>>>>> Output = 50,000 row
> > > > >>>>>>>>
> > > > >>>>>>>> import org.apache.hadoop.hbase.filter.CompareFilter
> > > > >>>>>>>> import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
> > > > >>>>>>>> import org.apache.hadoop.hbase.filter.SubstringComparator
> > > > >>>>>>>> import org.apache.hadoop.hbase.util.Bytes
> > > > >>>>>>>>
> > > > >>>>>>>> count 'patient', { COLUMNS => "info:diagnosis", FILTER =>
> > > > >>>>>>>>    SingleColumnValueFilter.new(Bytes.toBytes('info'),
> > > > >>>>>>>>         Bytes.toBytes('diagnosis'),
> > > > >>>>>>>>         CompareFilter::CompareOp.valueOf('EQUAL'),
> > > > >>>>>>>>         SubstringComparator.new('cardiac'))}
> > > > >>>>>>>> Output = 100,000 row
> > > > >>>>>>>>
> > > > >>>>>>>> Even though I tried it using Hbase Java API, Aggregation
> > Client
> > > > >>>>>> Instance,
> > > > >>>>>>>> and I enabled the Coprocessor aggregation for the table.
> > > > >>>>>>>> rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan)
> > > > >>>>>>>>
> > > > >>>>>>>> Also when measuring the improved performance on case of adding
> > > > more
> > > > >>>>>> nodes
> > > > >>>>>>>> the operation takes the same time.
> > > > >>>>>>>>
> > > > >>>>>>>> So any advice please?
> > > > >>>>>>>>
> > > > >>>>>>>> I have been throughout all this mess from a couple of weeks
> > > > >>>>>>>>
> > > > >>>>>>>> Thanks,
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>
> > > > >>>>
> > > > >>
> > > > >
> > > >
> >
> >

Re: Hbase Count Aggregate Function

Posted by ramkrishna vasudevan <ra...@gmail.com>.

Dalia,

I tried out this eg,

{code}
  private static final byte[] TEST_TABLE = Bytes.toBytes("TestTable");
  private static final byte[] TEST_FAMILY = Bytes.toBytes("TestFamily");
  private static final byte[] TEST_QUALIFIER =
Bytes.toBytes("TestQualifier");
  private static final byte[] TEST_MULTI_CQ = Bytes.toBytes("TestMultiCQ");

  private static byte[] ROW = Bytes.toBytes("testRow");
  private static final int ROWSIZE = 20;
  private static final int rowSeperator1 = 5;
  private static final int rowSeperator2 = 12;
  private static byte[][] ROWS = makeN(ROW, ROWSIZE);
for (int i = 0; i < ROWSIZE; i++) {
      Put put = new Put(ROWS[i]);
      put.setWriteToWAL(false);
      Long l = new Long(i);
      put.add(TEST_FAMILY, TEST_QUALIFIER, Bytes.toBytes(l));
      table.put(put);
      Put p2 = new Put(ROWS[i]);
      put.setWriteToWAL(false);
      p2.add(TEST_FAMILY, Bytes.add(TEST_MULTI_CQ, Bytes.toBytes(l)), Bytes
          .toBytes(l * 10));
      table.put(p2);

   AggregationClient aClient = new AggregationClient(conf);
    Scan scan = new Scan();
    scan.addColumn(TEST_FAMILY, TEST_QUALIFIER);
    final ColumnInterpreter<Long, Long> ci = new LongColumnInterpreter();
    SingleColumnValueFilter scvf = new SingleColumnValueFilter(TEST_FAMILY,
TEST_QUALIFIER, CompareOp.EQUAL,
          Bytes.toBytes(4l));
    scan.setFilter(scvf);
    long rowCount = aClient.rowCount(TEST_TABLE, ci,
        scan);
    assertEquals(ROWSIZE, rowCount);
    }
{code}

So this assertion is failing and it is working as expected.  If you want to
try out check out the testcase
in TestAggregateProtocol.testRowCountAllTable().
Just modify the testcase so that you pass a SingleColumnValueFilter.  It is
working fine.

Please check and let me know.  May be am doing some mistake.

Regards
Ram

On Tue, Dec 25, 2012 at 11:25 PM, Dalia Sobhy <da...@hotmail.com>wrote:

>
> Is there a problem in letting ID (rowkey) "int" value??
>
> > Date: Tue, 25 Dec 2012 22:44:00 +0530
> > Subject: Re: Hbase Count Aggregate Function
> > From: ramkrishna.s.vasudevan@gmail.com
> > To: user@hbase.apache.org
> >
> > @Dalia
> >
> > I think the aggregation client should work with what you have passed.
>  What
> > i meant in the previous mail was with table.count() and now with
> > AggregationClient.
> > {code}
> > if (scan.getFilter() == null && qualifier == null)
> >       scan.setFilter(new FirstKeyOnlyFilter());
> > {code}
> >
> > So as you have passed the filter then it should work as how the SCVF
> should
> > work.  I can check this out during free time (may be tomorrow).
> > If not you can raise a bug.  If it turns to be fine then we can close it
> > out otherwise its better we fix it.
> > I can understand your urgency in this.
> >
> > Regards
> > Ram
> >
> >
> >
> >
> >
> > On Tue, Dec 25, 2012 at 10:27 PM, <yu...@gmail.com> wrote:
> >
> > > RowCount method accepts scan object where you can attach your custom
> > > filter.
> > >
> > > Cheers
> > >
> > >
> > >
> > > On Dec 25, 2012, at 8:42 AM, Dalia Sobhy <da...@hotmail.com>
> > > wrote:
> > >
> > > >
> > > > Do you mean I implement a new rowCount method in Aggregation Client
> > > Class.
> > > >
> > > > I cannot understand, could u illustrate with a code sample Ram?
> > > >
> > > >>> Date: Tue, 25 Dec 2012 00:21:14 +0530
> > > >>> Subject: Re: Hbase Count Aggregate Function
> > > >>> From: ramkrishna.s.vasudevan@gmail.com
> > > >>> To: user@hbase.apache.org
> > > >>>
> > > >>> Hi
> > > >>> You could have custom filter implemented which is similar to
> > > >>> FirstKeyOnlyfilter.
> > > >>> Implement the filterKeyValue method such that it should match your
> > > keyvalue
> > > >>> (the specific qualifier that you are looking for).
> > > >>>
> > > >>> Deploy it in your cluster.  It should work.
> > > >>>
> > > >>> Regards
> > > >>> Ram
> > > >>>
> > > >>> On Mon, Dec 24, 2012 at 10:35 PM, Dalia Sobhy <
> > > dalia.mohsobhy@hotmail.com>wrote:
> > > >>>
> > > >>>>
> > > >>>> So do you have a suggestion how to enable/work the filter?
> > > >>>>
> > > >>>>> Date: Mon, 24 Dec 2012 22:22:49 +0530
> > > >>>>> Subject: Re: Hbase Count Aggregate Function
> > > >>>>> From: ramkrishna.s.vasudevan@gmail.com
> > > >>>>> To: user@hbase.apache.org
> > > >>>>>
> > > >>>>> Okie, seeing the shell script and the code I feel that while you
> use
> > > this
> > > >>>>> counter, the user's filter is not taken into account.
> > > >>>>> It adds a FirstKeyOnlyFilter and proceeds with the scan. :(.
> > > >>>>>
> > > >>>>> Regards
> > > >>>>> Ram
> > > >>>>>
> > > >>>>> On Mon, Dec 24, 2012 at 10:11 PM, Dalia Sobhy <
> > > >>>> dalia.mohsobhy@hotmail.com>wrote:
> > > >>>>>
> > > >>>>>>
> > > >>>>>> yeah scan gives the correct number of rows, while count returns
> the
> > > >>>> total
> > > >>>>>> number of rows.
> > > >>>>>>
> > > >>>>>> Both are using the same filter, I even tried it using Java API,
> > > using
> > > >>>> row
> > > >>>>>> count method.
> > > >>>>>>
> > > >>>>>> rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan);
> > > >>>>>>
> > > >>>>>> I get the total number of rows not the number of rows filtered.
> > > >>>>>>
> > > >>>>>> So any idea ??
> > > >>>>>>
> > > >>>>>> Thanks Ram :)
> > > >>>>>>
> > > >>>>>>> Date: Mon, 24 Dec 2012 21:57:54 +0530
> > > >>>>>>> Subject: Re: Hbase Count Aggregate Function
> > > >>>>>>> From: ramkrishna.s.vasudevan@gmail.com
> > > >>>>>>> To: user@hbase.apache.org
> > > >>>>>>>
> > > >>>>>>> So you find that scan with a filter and count with the same
> filter
> > > is
> > > >>>>>>> giving you different results?
> > > >>>>>>>
> > > >>>>>>> Regards
> > > >>>>>>> Ram
> > > >>>>>>>
> > > >>>>>>> On Mon, Dec 24, 2012 at 8:33 PM, Dalia Sobhy <
> > > >>>> dalia.mohsobhy@hotmail.com
> > > >>>>>>> wrote:
> > > >>>>>>>
> > > >>>>>>>>
> > > >>>>>>>> Dear all,
> > > >>>>>>>>
> > > >>>>>>>> I have 50,000 row with diagnosis qualifier = "cardiac", and
> > > another
> > > >>>>>> 50,000
> > > >>>>>>>> rows with "renal".
> > > >>>>>>>>
> > > >>>>>>>> When I type this in Hbase shell,
> > > >>>>>>>>
> > > >>>>>>>> import org.apache.hadoop.hbase.filter.CompareFilter
> > > >>>>>>>> import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
> > > >>>>>>>> import org.apache.hadoop.hbase.filter.SubstringComparator
> > > >>>>>>>> import org.apache.hadoop.hbase.util.Bytes
> > > >>>>>>>>
> > > >>>>>>>> scan 'patient', { COLUMNS => "info:diagnosis", FILTER =>
> > > >>>>>>>>    SingleColumnValueFilter.new(Bytes.toBytes('info'),
> > > >>>>>>>>         Bytes.toBytes('diagnosis'),
> > > >>>>>>>>         CompareFilter::CompareOp.valueOf('EQUAL'),
> > > >>>>>>>>         SubstringComparator.new('cardiac'))}
> > > >>>>>>>>
> > > >>>>>>>> Output = 50,000 row
> > > >>>>>>>>
> > > >>>>>>>> import org.apache.hadoop.hbase.filter.CompareFilter
> > > >>>>>>>> import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
> > > >>>>>>>> import org.apache.hadoop.hbase.filter.SubstringComparator
> > > >>>>>>>> import org.apache.hadoop.hbase.util.Bytes
> > > >>>>>>>>
> > > >>>>>>>> count 'patient', { COLUMNS => "info:diagnosis", FILTER =>
> > > >>>>>>>>    SingleColumnValueFilter.new(Bytes.toBytes('info'),
> > > >>>>>>>>         Bytes.toBytes('diagnosis'),
> > > >>>>>>>>         CompareFilter::CompareOp.valueOf('EQUAL'),
> > > >>>>>>>>         SubstringComparator.new('cardiac'))}
> > > >>>>>>>> Output = 100,000 row
> > > >>>>>>>>
> > > >>>>>>>> Even though I tried it using Hbase Java API, Aggregation
> Client
> > > >>>>>> Instance,
> > > >>>>>>>> and I enabled the Coprocessor aggregation for the table.
> > > >>>>>>>> rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan)
> > > >>>>>>>>
> > > >>>>>>>> Also when measuring the improved performance on case of adding
> > > more
> > > >>>>>> nodes
> > > >>>>>>>> the operation takes the same time.
> > > >>>>>>>>
> > > >>>>>>>> So any advice please?
> > > >>>>>>>>
> > > >>>>>>>> I have been throughout all this mess from a couple of weeks
> > > >>>>>>>>
> > > >>>>>>>> Thanks,
> > > >>>>>>
> > > >>>>>>
> > > >>>>
> > > >>>>
> > > >>
> > > >
> > >
>
>

RE: Hbase Count Aggregate Function

Posted by Dalia Sobhy <da...@hotmail.com>.

Is there a problem in letting ID (rowkey) "int" value??

> Date: Tue, 25 Dec 2012 22:44:00 +0530
> Subject: Re: Hbase Count Aggregate Function
> From: ramkrishna.s.vasudevan@gmail.com
> To: user@hbase.apache.org
> 
> @Dalia
> 
> I think the aggregation client should work with what you have passed.  What
> i meant in the previous mail was with table.count() and now with
> AggregationClient.
> {code}
> if (scan.getFilter() == null && qualifier == null)
>       scan.setFilter(new FirstKeyOnlyFilter());
> {code}
> 
> So as you have passed the filter then it should work as how the SCVF should
> work.  I can check this out during free time (may be tomorrow).
> If not you can raise a bug.  If it turns to be fine then we can close it
> out otherwise its better we fix it.
> I can understand your urgency in this.
> 
> Regards
> Ram
> 
> 
> 
> 
> 
> On Tue, Dec 25, 2012 at 10:27 PM, <yu...@gmail.com> wrote:
> 
> > RowCount method accepts scan object where you can attach your custom
> > filter.
> >
> > Cheers
> >
> >
> >
> > On Dec 25, 2012, at 8:42 AM, Dalia Sobhy <da...@hotmail.com>
> > wrote:
> >
> > >
> > > Do you mean I implement a new rowCount method in Aggregation Client
> > Class.
> > >
> > > I cannot understand, could u illustrate with a code sample Ram?
> > >
> > >>> Date: Tue, 25 Dec 2012 00:21:14 +0530
> > >>> Subject: Re: Hbase Count Aggregate Function
> > >>> From: ramkrishna.s.vasudevan@gmail.com
> > >>> To: user@hbase.apache.org
> > >>>
> > >>> Hi
> > >>> You could have custom filter implemented which is similar to
> > >>> FirstKeyOnlyfilter.
> > >>> Implement the filterKeyValue method such that it should match your
> > keyvalue
> > >>> (the specific qualifier that you are looking for).
> > >>>
> > >>> Deploy it in your cluster.  It should work.
> > >>>
> > >>> Regards
> > >>> Ram
> > >>>
> > >>> On Mon, Dec 24, 2012 at 10:35 PM, Dalia Sobhy <
> > dalia.mohsobhy@hotmail.com>wrote:
> > >>>
> > >>>>
> > >>>> So do you have a suggestion how to enable/work the filter?
> > >>>>
> > >>>>> Date: Mon, 24 Dec 2012 22:22:49 +0530
> > >>>>> Subject: Re: Hbase Count Aggregate Function
> > >>>>> From: ramkrishna.s.vasudevan@gmail.com
> > >>>>> To: user@hbase.apache.org
> > >>>>>
> > >>>>> Okie, seeing the shell script and the code I feel that while you use
> > this
> > >>>>> counter, the user's filter is not taken into account.
> > >>>>> It adds a FirstKeyOnlyFilter and proceeds with the scan. :(.
> > >>>>>
> > >>>>> Regards
> > >>>>> Ram
> > >>>>>
> > >>>>> On Mon, Dec 24, 2012 at 10:11 PM, Dalia Sobhy <
> > >>>> dalia.mohsobhy@hotmail.com>wrote:
> > >>>>>
> > >>>>>>
> > >>>>>> yeah scan gives the correct number of rows, while count returns the
> > >>>> total
> > >>>>>> number of rows.
> > >>>>>>
> > >>>>>> Both are using the same filter, I even tried it using Java API,
> > using
> > >>>> row
> > >>>>>> count method.
> > >>>>>>
> > >>>>>> rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan);
> > >>>>>>
> > >>>>>> I get the total number of rows not the number of rows filtered.
> > >>>>>>
> > >>>>>> So any idea ??
> > >>>>>>
> > >>>>>> Thanks Ram :)
> > >>>>>>
> > >>>>>>> Date: Mon, 24 Dec 2012 21:57:54 +0530
> > >>>>>>> Subject: Re: Hbase Count Aggregate Function
> > >>>>>>> From: ramkrishna.s.vasudevan@gmail.com
> > >>>>>>> To: user@hbase.apache.org
> > >>>>>>>
> > >>>>>>> So you find that scan with a filter and count with the same filter
> > is
> > >>>>>>> giving you different results?
> > >>>>>>>
> > >>>>>>> Regards
> > >>>>>>> Ram
> > >>>>>>>
> > >>>>>>> On Mon, Dec 24, 2012 at 8:33 PM, Dalia Sobhy <
> > >>>> dalia.mohsobhy@hotmail.com
> > >>>>>>> wrote:
> > >>>>>>>
> > >>>>>>>>
> > >>>>>>>> Dear all,
> > >>>>>>>>
> > >>>>>>>> I have 50,000 row with diagnosis qualifier = "cardiac", and
> > another
> > >>>>>> 50,000
> > >>>>>>>> rows with "renal".
> > >>>>>>>>
> > >>>>>>>> When I type this in Hbase shell,
> > >>>>>>>>
> > >>>>>>>> import org.apache.hadoop.hbase.filter.CompareFilter
> > >>>>>>>> import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
> > >>>>>>>> import org.apache.hadoop.hbase.filter.SubstringComparator
> > >>>>>>>> import org.apache.hadoop.hbase.util.Bytes
> > >>>>>>>>
> > >>>>>>>> scan 'patient', { COLUMNS => "info:diagnosis", FILTER =>
> > >>>>>>>>    SingleColumnValueFilter.new(Bytes.toBytes('info'),
> > >>>>>>>>         Bytes.toBytes('diagnosis'),
> > >>>>>>>>         CompareFilter::CompareOp.valueOf('EQUAL'),
> > >>>>>>>>         SubstringComparator.new('cardiac'))}
> > >>>>>>>>
> > >>>>>>>> Output = 50,000 row
> > >>>>>>>>
> > >>>>>>>> import org.apache.hadoop.hbase.filter.CompareFilter
> > >>>>>>>> import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
> > >>>>>>>> import org.apache.hadoop.hbase.filter.SubstringComparator
> > >>>>>>>> import org.apache.hadoop.hbase.util.Bytes
> > >>>>>>>>
> > >>>>>>>> count 'patient', { COLUMNS => "info:diagnosis", FILTER =>
> > >>>>>>>>    SingleColumnValueFilter.new(Bytes.toBytes('info'),
> > >>>>>>>>         Bytes.toBytes('diagnosis'),
> > >>>>>>>>         CompareFilter::CompareOp.valueOf('EQUAL'),
> > >>>>>>>>         SubstringComparator.new('cardiac'))}
> > >>>>>>>> Output = 100,000 row
> > >>>>>>>>
> > >>>>>>>> Even though I tried it using Hbase Java API, Aggregation Client
> > >>>>>> Instance,
> > >>>>>>>> and I enabled the Coprocessor aggregation for the table.
> > >>>>>>>> rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan)
> > >>>>>>>>
> > >>>>>>>> Also when measuring the improved performance on case of adding
> > more
> > >>>>>> nodes
> > >>>>>>>> the operation takes the same time.
> > >>>>>>>>
> > >>>>>>>> So any advice please?
> > >>>>>>>>
> > >>>>>>>> I have been throughout all this mess from a couple of weeks
> > >>>>>>>>
> > >>>>>>>> Thanks,
> > >>>>>>
> > >>>>>>
> > >>>>
> > >>>>
> > >>
> > >
> >

RE: Hbase Count Aggregate Function

Posted by Dalia Sobhy <da...@hotmail.com>.

Thanks Ram,

I have tried it alot.

I even tried to it by hbase shell, by scanning using filters. 

By using scan , it returns the right number. But still the aggregationClient RowCount method returns the wrong number as if it cannot see the filter. Although I have sent it false values to return zero, it returned the total number of rows in the table.

So what do you think ??

> Date: Tue, 25 Dec 2012 22:44:00 +0530
> Subject: Re: Hbase Count Aggregate Function
> From: ramkrishna.s.vasudevan@gmail.com
> To: user@hbase.apache.org
> 
> @Dalia
> 
> I think the aggregation client should work with what you have passed.  What
> i meant in the previous mail was with table.count() and now with
> AggregationClient.
> {code}
> if (scan.getFilter() == null && qualifier == null)
>       scan.setFilter(new FirstKeyOnlyFilter());
> {code}
> 
> So as you have passed the filter then it should work as how the SCVF should
> work.  I can check this out during free time (may be tomorrow).
> If not you can raise a bug.  If it turns to be fine then we can close it
> out otherwise its better we fix it.
> I can understand your urgency in this.
> 
> Regards
> Ram
> 
> 
> 
> 
> 
> On Tue, Dec 25, 2012 at 10:27 PM, <yu...@gmail.com> wrote:
> 
> > RowCount method accepts scan object where you can attach your custom
> > filter.
> >
> > Cheers
> >
> >
> >
> > On Dec 25, 2012, at 8:42 AM, Dalia Sobhy <da...@hotmail.com>
> > wrote:
> >
> > >
> > > Do you mean I implement a new rowCount method in Aggregation Client
> > Class.
> > >
> > > I cannot understand, could u illustrate with a code sample Ram?
> > >
> > >>> Date: Tue, 25 Dec 2012 00:21:14 +0530
> > >>> Subject: Re: Hbase Count Aggregate Function
> > >>> From: ramkrishna.s.vasudevan@gmail.com
> > >>> To: user@hbase.apache.org
> > >>>
> > >>> Hi
> > >>> You could have custom filter implemented which is similar to
> > >>> FirstKeyOnlyfilter.
> > >>> Implement the filterKeyValue method such that it should match your
> > keyvalue
> > >>> (the specific qualifier that you are looking for).
> > >>>
> > >>> Deploy it in your cluster.  It should work.
> > >>>
> > >>> Regards
> > >>> Ram
> > >>>
> > >>> On Mon, Dec 24, 2012 at 10:35 PM, Dalia Sobhy <
> > dalia.mohsobhy@hotmail.com>wrote:
> > >>>
> > >>>>
> > >>>> So do you have a suggestion how to enable/work the filter?
> > >>>>
> > >>>>> Date: Mon, 24 Dec 2012 22:22:49 +0530
> > >>>>> Subject: Re: Hbase Count Aggregate Function
> > >>>>> From: ramkrishna.s.vasudevan@gmail.com
> > >>>>> To: user@hbase.apache.org
> > >>>>>
> > >>>>> Okie, seeing the shell script and the code I feel that while you use
> > this
> > >>>>> counter, the user's filter is not taken into account.
> > >>>>> It adds a FirstKeyOnlyFilter and proceeds with the scan. :(.
> > >>>>>
> > >>>>> Regards
> > >>>>> Ram
> > >>>>>
> > >>>>> On Mon, Dec 24, 2012 at 10:11 PM, Dalia Sobhy <
> > >>>> dalia.mohsobhy@hotmail.com>wrote:
> > >>>>>
> > >>>>>>
> > >>>>>> yeah scan gives the correct number of rows, while count returns the
> > >>>> total
> > >>>>>> number of rows.
> > >>>>>>
> > >>>>>> Both are using the same filter, I even tried it using Java API,
> > using
> > >>>> row
> > >>>>>> count method.
> > >>>>>>
> > >>>>>> rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan);
> > >>>>>>
> > >>>>>> I get the total number of rows not the number of rows filtered.
> > >>>>>>
> > >>>>>> So any idea ??
> > >>>>>>
> > >>>>>> Thanks Ram :)
> > >>>>>>
> > >>>>>>> Date: Mon, 24 Dec 2012 21:57:54 +0530
> > >>>>>>> Subject: Re: Hbase Count Aggregate Function
> > >>>>>>> From: ramkrishna.s.vasudevan@gmail.com
> > >>>>>>> To: user@hbase.apache.org
> > >>>>>>>
> > >>>>>>> So you find that scan with a filter and count with the same filter
> > is
> > >>>>>>> giving you different results?
> > >>>>>>>
> > >>>>>>> Regards
> > >>>>>>> Ram
> > >>>>>>>
> > >>>>>>> On Mon, Dec 24, 2012 at 8:33 PM, Dalia Sobhy <
> > >>>> dalia.mohsobhy@hotmail.com
> > >>>>>>> wrote:
> > >>>>>>>
> > >>>>>>>>
> > >>>>>>>> Dear all,
> > >>>>>>>>
> > >>>>>>>> I have 50,000 row with diagnosis qualifier = "cardiac", and
> > another
> > >>>>>> 50,000
> > >>>>>>>> rows with "renal".
> > >>>>>>>>
> > >>>>>>>> When I type this in Hbase shell,
> > >>>>>>>>
> > >>>>>>>> import org.apache.hadoop.hbase.filter.CompareFilter
> > >>>>>>>> import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
> > >>>>>>>> import org.apache.hadoop.hbase.filter.SubstringComparator
> > >>>>>>>> import org.apache.hadoop.hbase.util.Bytes
> > >>>>>>>>
> > >>>>>>>> scan 'patient', { COLUMNS => "info:diagnosis", FILTER =>
> > >>>>>>>>    SingleColumnValueFilter.new(Bytes.toBytes('info'),
> > >>>>>>>>         Bytes.toBytes('diagnosis'),
> > >>>>>>>>         CompareFilter::CompareOp.valueOf('EQUAL'),
> > >>>>>>>>         SubstringComparator.new('cardiac'))}
> > >>>>>>>>
> > >>>>>>>> Output = 50,000 row
> > >>>>>>>>
> > >>>>>>>> import org.apache.hadoop.hbase.filter.CompareFilter
> > >>>>>>>> import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
> > >>>>>>>> import org.apache.hadoop.hbase.filter.SubstringComparator
> > >>>>>>>> import org.apache.hadoop.hbase.util.Bytes
> > >>>>>>>>
> > >>>>>>>> count 'patient', { COLUMNS => "info:diagnosis", FILTER =>
> > >>>>>>>>    SingleColumnValueFilter.new(Bytes.toBytes('info'),
> > >>>>>>>>         Bytes.toBytes('diagnosis'),
> > >>>>>>>>         CompareFilter::CompareOp.valueOf('EQUAL'),
> > >>>>>>>>         SubstringComparator.new('cardiac'))}
> > >>>>>>>> Output = 100,000 row
> > >>>>>>>>
> > >>>>>>>> Even though I tried it using Hbase Java API, Aggregation Client
> > >>>>>> Instance,
> > >>>>>>>> and I enabled the Coprocessor aggregation for the table.
> > >>>>>>>> rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan)
> > >>>>>>>>
> > >>>>>>>> Also when measuring the improved performance on case of adding
> > more
> > >>>>>> nodes
> > >>>>>>>> the operation takes the same time.
> > >>>>>>>>
> > >>>>>>>> So any advice please?
> > >>>>>>>>
> > >>>>>>>> I have been throughout all this mess from a couple of weeks
> > >>>>>>>>
> > >>>>>>>> Thanks,
> > >>>>>>
> > >>>>>>
> > >>>>
> > >>>>
> > >>
> > >
> >

Re: Hbase Count Aggregate Function

Posted by ramkrishna vasudevan <ra...@gmail.com>.

@Dalia

I think the aggregation client should work with what you have passed.  What
i meant in the previous mail was with table.count() and now with
AggregationClient.
{code}
if (scan.getFilter() == null && qualifier == null)
      scan.setFilter(new FirstKeyOnlyFilter());
{code}

So as you have passed the filter then it should work as how the SCVF should
work.  I can check this out during free time (may be tomorrow).
If not you can raise a bug.  If it turns to be fine then we can close it
out otherwise its better we fix it.
I can understand your urgency in this.

Regards
Ram





On Tue, Dec 25, 2012 at 10:27 PM, <yu...@gmail.com> wrote:

> RowCount method accepts scan object where you can attach your custom
> filter.
>
> Cheers
>
>
>
> On Dec 25, 2012, at 8:42 AM, Dalia Sobhy <da...@hotmail.com>
> wrote:
>
> >
> > Do you mean I implement a new rowCount method in Aggregation Client
> Class.
> >
> > I cannot understand, could u illustrate with a code sample Ram?
> >
> >>> Date: Tue, 25 Dec 2012 00:21:14 +0530
> >>> Subject: Re: Hbase Count Aggregate Function
> >>> From: ramkrishna.s.vasudevan@gmail.com
> >>> To: user@hbase.apache.org
> >>>
> >>> Hi
> >>> You could have custom filter implemented which is similar to
> >>> FirstKeyOnlyfilter.
> >>> Implement the filterKeyValue method such that it should match your
> keyvalue
> >>> (the specific qualifier that you are looking for).
> >>>
> >>> Deploy it in your cluster.  It should work.
> >>>
> >>> Regards
> >>> Ram
> >>>
> >>> On Mon, Dec 24, 2012 at 10:35 PM, Dalia Sobhy <
> dalia.mohsobhy@hotmail.com>wrote:
> >>>
> >>>>
> >>>> So do you have a suggestion how to enable/work the filter?
> >>>>
> >>>>> Date: Mon, 24 Dec 2012 22:22:49 +0530
> >>>>> Subject: Re: Hbase Count Aggregate Function
> >>>>> From: ramkrishna.s.vasudevan@gmail.com
> >>>>> To: user@hbase.apache.org
> >>>>>
> >>>>> Okie, seeing the shell script and the code I feel that while you use
> this
> >>>>> counter, the user's filter is not taken into account.
> >>>>> It adds a FirstKeyOnlyFilter and proceeds with the scan. :(.
> >>>>>
> >>>>> Regards
> >>>>> Ram
> >>>>>
> >>>>> On Mon, Dec 24, 2012 at 10:11 PM, Dalia Sobhy <
> >>>> dalia.mohsobhy@hotmail.com>wrote:
> >>>>>
> >>>>>>
> >>>>>> yeah scan gives the correct number of rows, while count returns the
> >>>> total
> >>>>>> number of rows.
> >>>>>>
> >>>>>> Both are using the same filter, I even tried it using Java API,
> using
> >>>> row
> >>>>>> count method.
> >>>>>>
> >>>>>> rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan);
> >>>>>>
> >>>>>> I get the total number of rows not the number of rows filtered.
> >>>>>>
> >>>>>> So any idea ??
> >>>>>>
> >>>>>> Thanks Ram :)
> >>>>>>
> >>>>>>> Date: Mon, 24 Dec 2012 21:57:54 +0530
> >>>>>>> Subject: Re: Hbase Count Aggregate Function
> >>>>>>> From: ramkrishna.s.vasudevan@gmail.com
> >>>>>>> To: user@hbase.apache.org
> >>>>>>>
> >>>>>>> So you find that scan with a filter and count with the same filter
> is
> >>>>>>> giving you different results?
> >>>>>>>
> >>>>>>> Regards
> >>>>>>> Ram
> >>>>>>>
> >>>>>>> On Mon, Dec 24, 2012 at 8:33 PM, Dalia Sobhy <
> >>>> dalia.mohsobhy@hotmail.com
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>>
> >>>>>>>> Dear all,
> >>>>>>>>
> >>>>>>>> I have 50,000 row with diagnosis qualifier = "cardiac", and
> another
> >>>>>> 50,000
> >>>>>>>> rows with "renal".
> >>>>>>>>
> >>>>>>>> When I type this in Hbase shell,
> >>>>>>>>
> >>>>>>>> import org.apache.hadoop.hbase.filter.CompareFilter
> >>>>>>>> import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
> >>>>>>>> import org.apache.hadoop.hbase.filter.SubstringComparator
> >>>>>>>> import org.apache.hadoop.hbase.util.Bytes
> >>>>>>>>
> >>>>>>>> scan 'patient', { COLUMNS => "info:diagnosis", FILTER =>
> >>>>>>>>    SingleColumnValueFilter.new(Bytes.toBytes('info'),
> >>>>>>>>         Bytes.toBytes('diagnosis'),
> >>>>>>>>         CompareFilter::CompareOp.valueOf('EQUAL'),
> >>>>>>>>         SubstringComparator.new('cardiac'))}
> >>>>>>>>
> >>>>>>>> Output = 50,000 row
> >>>>>>>>
> >>>>>>>> import org.apache.hadoop.hbase.filter.CompareFilter
> >>>>>>>> import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
> >>>>>>>> import org.apache.hadoop.hbase.filter.SubstringComparator
> >>>>>>>> import org.apache.hadoop.hbase.util.Bytes
> >>>>>>>>
> >>>>>>>> count 'patient', { COLUMNS => "info:diagnosis", FILTER =>
> >>>>>>>>    SingleColumnValueFilter.new(Bytes.toBytes('info'),
> >>>>>>>>         Bytes.toBytes('diagnosis'),
> >>>>>>>>         CompareFilter::CompareOp.valueOf('EQUAL'),
> >>>>>>>>         SubstringComparator.new('cardiac'))}
> >>>>>>>> Output = 100,000 row
> >>>>>>>>
> >>>>>>>> Even though I tried it using Hbase Java API, Aggregation Client
> >>>>>> Instance,
> >>>>>>>> and I enabled the Coprocessor aggregation for the table.
> >>>>>>>> rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan)
> >>>>>>>>
> >>>>>>>> Also when measuring the improved performance on case of adding
> more
> >>>>>> nodes
> >>>>>>>> the operation takes the same time.
> >>>>>>>>
> >>>>>>>> So any advice please?
> >>>>>>>>
> >>>>>>>> I have been throughout all this mess from a couple of weeks
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>
> >>>>>>
> >>>>
> >>>>
> >>
> >
>

Re: Hbase Count Aggregate Function

Posted by yu...@gmail.com.

RowCount method accepts scan object where you can attach your custom filter. 

Cheers



On Dec 25, 2012, at 8:42 AM, Dalia Sobhy <da...@hotmail.com> wrote:

> 
> Do you mean I implement a new rowCount method in Aggregation Client Class.
> 
> I cannot understand, could u illustrate with a code sample Ram?
> 
>>> Date: Tue, 25 Dec 2012 00:21:14 +0530
>>> Subject: Re: Hbase Count Aggregate Function
>>> From: ramkrishna.s.vasudevan@gmail.com
>>> To: user@hbase.apache.org
>>> 
>>> Hi
>>> You could have custom filter implemented which is similar to
>>> FirstKeyOnlyfilter.
>>> Implement the filterKeyValue method such that it should match your keyvalue
>>> (the specific qualifier that you are looking for).
>>> 
>>> Deploy it in your cluster.  It should work.
>>> 
>>> Regards
>>> Ram
>>> 
>>> On Mon, Dec 24, 2012 at 10:35 PM, Dalia Sobhy <da...@hotmail.com>wrote:
>>> 
>>>> 
>>>> So do you have a suggestion how to enable/work the filter?
>>>> 
>>>>> Date: Mon, 24 Dec 2012 22:22:49 +0530
>>>>> Subject: Re: Hbase Count Aggregate Function
>>>>> From: ramkrishna.s.vasudevan@gmail.com
>>>>> To: user@hbase.apache.org
>>>>> 
>>>>> Okie, seeing the shell script and the code I feel that while you use this
>>>>> counter, the user's filter is not taken into account.
>>>>> It adds a FirstKeyOnlyFilter and proceeds with the scan. :(.
>>>>> 
>>>>> Regards
>>>>> Ram
>>>>> 
>>>>> On Mon, Dec 24, 2012 at 10:11 PM, Dalia Sobhy <
>>>> dalia.mohsobhy@hotmail.com>wrote:
>>>>> 
>>>>>> 
>>>>>> yeah scan gives the correct number of rows, while count returns the
>>>> total
>>>>>> number of rows.
>>>>>> 
>>>>>> Both are using the same filter, I even tried it using Java API, using
>>>> row
>>>>>> count method.
>>>>>> 
>>>>>> rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan);
>>>>>> 
>>>>>> I get the total number of rows not the number of rows filtered.
>>>>>> 
>>>>>> So any idea ??
>>>>>> 
>>>>>> Thanks Ram :)
>>>>>> 
>>>>>>> Date: Mon, 24 Dec 2012 21:57:54 +0530
>>>>>>> Subject: Re: Hbase Count Aggregate Function
>>>>>>> From: ramkrishna.s.vasudevan@gmail.com
>>>>>>> To: user@hbase.apache.org
>>>>>>> 
>>>>>>> So you find that scan with a filter and count with the same filter is
>>>>>>> giving you different results?
>>>>>>> 
>>>>>>> Regards
>>>>>>> Ram
>>>>>>> 
>>>>>>> On Mon, Dec 24, 2012 at 8:33 PM, Dalia Sobhy <
>>>> dalia.mohsobhy@hotmail.com
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> 
>>>>>>>> Dear all,
>>>>>>>> 
>>>>>>>> I have 50,000 row with diagnosis qualifier = "cardiac", and another
>>>>>> 50,000
>>>>>>>> rows with "renal".
>>>>>>>> 
>>>>>>>> When I type this in Hbase shell,
>>>>>>>> 
>>>>>>>> import org.apache.hadoop.hbase.filter.CompareFilter
>>>>>>>> import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
>>>>>>>> import org.apache.hadoop.hbase.filter.SubstringComparator
>>>>>>>> import org.apache.hadoop.hbase.util.Bytes
>>>>>>>> 
>>>>>>>> scan 'patient', { COLUMNS => "info:diagnosis", FILTER =>
>>>>>>>>    SingleColumnValueFilter.new(Bytes.toBytes('info'),
>>>>>>>>         Bytes.toBytes('diagnosis'),
>>>>>>>>         CompareFilter::CompareOp.valueOf('EQUAL'),
>>>>>>>>         SubstringComparator.new('cardiac'))}
>>>>>>>> 
>>>>>>>> Output = 50,000 row
>>>>>>>> 
>>>>>>>> import org.apache.hadoop.hbase.filter.CompareFilter
>>>>>>>> import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
>>>>>>>> import org.apache.hadoop.hbase.filter.SubstringComparator
>>>>>>>> import org.apache.hadoop.hbase.util.Bytes
>>>>>>>> 
>>>>>>>> count 'patient', { COLUMNS => "info:diagnosis", FILTER =>
>>>>>>>>    SingleColumnValueFilter.new(Bytes.toBytes('info'),
>>>>>>>>         Bytes.toBytes('diagnosis'),
>>>>>>>>         CompareFilter::CompareOp.valueOf('EQUAL'),
>>>>>>>>         SubstringComparator.new('cardiac'))}
>>>>>>>> Output = 100,000 row
>>>>>>>> 
>>>>>>>> Even though I tried it using Hbase Java API, Aggregation Client
>>>>>> Instance,
>>>>>>>> and I enabled the Coprocessor aggregation for the table.
>>>>>>>> rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan)
>>>>>>>> 
>>>>>>>> Also when measuring the improved performance on case of adding more
>>>>>> nodes
>>>>>>>> the operation takes the same time.
>>>>>>>> 
>>>>>>>> So any advice please?
>>>>>>>> 
>>>>>>>> I have been throughout all this mess from a couple of weeks
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>> 
>>>>>> 
>>>> 
>>>> 
>>                         
>

RE: Hbase Count Aggregate Function

Posted by Dalia Sobhy <da...@hotmail.com>.

Do you mean I implement a new rowCount method in Aggregation Client Class.

I cannot understand, could u illustrate with a code sample Ram?

> > Date: Tue, 25 Dec 2012 00:21:14 +0530
> > Subject: Re: Hbase Count Aggregate Function
> > From: ramkrishna.s.vasudevan@gmail.com
> > To: user@hbase.apache.org
> > 
> > Hi
> > You could have custom filter implemented which is similar to
> > FirstKeyOnlyfilter.
> > Implement the filterKeyValue method such that it should match your keyvalue
> > (the specific qualifier that you are looking for).
> > 
> > Deploy it in your cluster.  It should work.
> > 
> > Regards
> > Ram
> > 
> > On Mon, Dec 24, 2012 at 10:35 PM, Dalia Sobhy <da...@hotmail.com>wrote:
> > 
> > >
> > > So do you have a suggestion how to enable/work the filter?
> > >
> > > > Date: Mon, 24 Dec 2012 22:22:49 +0530
> > > > Subject: Re: Hbase Count Aggregate Function
> > > > From: ramkrishna.s.vasudevan@gmail.com
> > > > To: user@hbase.apache.org
> > > >
> > > > Okie, seeing the shell script and the code I feel that while you use this
> > > > counter, the user's filter is not taken into account.
> > > > It adds a FirstKeyOnlyFilter and proceeds with the scan. :(.
> > > >
> > > > Regards
> > > > Ram
> > > >
> > > > On Mon, Dec 24, 2012 at 10:11 PM, Dalia Sobhy <
> > > dalia.mohsobhy@hotmail.com>wrote:
> > > >
> > > > >
> > > > > yeah scan gives the correct number of rows, while count returns the
> > > total
> > > > > number of rows.
> > > > >
> > > > > Both are using the same filter, I even tried it using Java API, using
> > > row
> > > > > count method.
> > > > >
> > > > > rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan);
> > > > >
> > > > > I get the total number of rows not the number of rows filtered.
> > > > >
> > > > > So any idea ??
> > > > >
> > > > > Thanks Ram :)
> > > > >
> > > > > > Date: Mon, 24 Dec 2012 21:57:54 +0530
> > > > > > Subject: Re: Hbase Count Aggregate Function
> > > > > > From: ramkrishna.s.vasudevan@gmail.com
> > > > > > To: user@hbase.apache.org
> > > > > >
> > > > > > So you find that scan with a filter and count with the same filter is
> > > > > > giving you different results?
> > > > > >
> > > > > > Regards
> > > > > > Ram
> > > > > >
> > > > > > On Mon, Dec 24, 2012 at 8:33 PM, Dalia Sobhy <
> > > dalia.mohsobhy@hotmail.com
> > > > > >wrote:
> > > > > >
> > > > > > >
> > > > > > > Dear all,
> > > > > > >
> > > > > > > I have 50,000 row with diagnosis qualifier = "cardiac", and another
> > > > > 50,000
> > > > > > > rows with "renal".
> > > > > > >
> > > > > > > When I type this in Hbase shell,
> > > > > > >
> > > > > > > import org.apache.hadoop.hbase.filter.CompareFilter
> > > > > > > import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
> > > > > > > import org.apache.hadoop.hbase.filter.SubstringComparator
> > > > > > > import org.apache.hadoop.hbase.util.Bytes
> > > > > > >
> > > > > > > scan 'patient', { COLUMNS => "info:diagnosis", FILTER =>
> > > > > > >     SingleColumnValueFilter.new(Bytes.toBytes('info'),
> > > > > > >          Bytes.toBytes('diagnosis'),
> > > > > > >          CompareFilter::CompareOp.valueOf('EQUAL'),
> > > > > > >          SubstringComparator.new('cardiac'))}
> > > > > > >
> > > > > > > Output = 50,000 row
> > > > > > >
> > > > > > > import org.apache.hadoop.hbase.filter.CompareFilter
> > > > > > > import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
> > > > > > > import org.apache.hadoop.hbase.filter.SubstringComparator
> > > > > > > import org.apache.hadoop.hbase.util.Bytes
> > > > > > >
> > > > > > > count 'patient', { COLUMNS => "info:diagnosis", FILTER =>
> > > > > > >     SingleColumnValueFilter.new(Bytes.toBytes('info'),
> > > > > > >          Bytes.toBytes('diagnosis'),
> > > > > > >          CompareFilter::CompareOp.valueOf('EQUAL'),
> > > > > > >          SubstringComparator.new('cardiac'))}
> > > > > > > Output = 100,000 row
> > > > > > >
> > > > > > > Even though I tried it using Hbase Java API, Aggregation Client
> > > > > Instance,
> > > > > > > and I enabled the Coprocessor aggregation for the table.
> > > > > > > rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan)
> > > > > > >
> > > > > > > Also when measuring the improved performance on case of adding more
> > > > > nodes
> > > > > > > the operation takes the same time.
> > > > > > >
> > > > > > > So any advice please?
> > > > > > >
> > > > > > > I have been throughout all this mess from a couple of weeks
> > > > > > >
> > > > > > > Thanks,
> > > > >
> > > > >
> > >
> > >
>

RE: Hbase Count Aggregate Function

Posted by Dalia Sobhy <da...@hotmail.com>.

Do you mean I implement a new rowCount method in Aggregation Client Class.

I cannot understand, could u illustrate with a code sample Ram?

Thanks,

> Date: Tue, 25 Dec 2012 00:21:14 +0530
> Subject: Re: Hbase Count Aggregate Function
> From: ramkrishna.s.vasudevan@gmail.com
> To: user@hbase.apache.org
> 
> Hi
> You could have custom filter implemented which is similar to
> FirstKeyOnlyfilter.
> Implement the filterKeyValue method such that it should match your keyvalue
> (the specific qualifier that you are looking for).
> 
> Deploy it in your cluster.  It should work.
> 
> Regards
> Ram
> 
> On Mon, Dec 24, 2012 at 10:35 PM, Dalia Sobhy <da...@hotmail.com>wrote:
> 
> >
> > So do you have a suggestion how to enable/work the filter?
> >
> > > Date: Mon, 24 Dec 2012 22:22:49 +0530
> > > Subject: Re: Hbase Count Aggregate Function
> > > From: ramkrishna.s.vasudevan@gmail.com
> > > To: user@hbase.apache.org
> > >
> > > Okie, seeing the shell script and the code I feel that while you use this
> > > counter, the user's filter is not taken into account.
> > > It adds a FirstKeyOnlyFilter and proceeds with the scan. :(.
> > >
> > > Regards
> > > Ram
> > >
> > > On Mon, Dec 24, 2012 at 10:11 PM, Dalia Sobhy <
> > dalia.mohsobhy@hotmail.com>wrote:
> > >
> > > >
> > > > yeah scan gives the correct number of rows, while count returns the
> > total
> > > > number of rows.
> > > >
> > > > Both are using the same filter, I even tried it using Java API, using
> > row
> > > > count method.
> > > >
> > > > rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan);
> > > >
> > > > I get the total number of rows not the number of rows filtered.
> > > >
> > > > So any idea ??
> > > >
> > > > Thanks Ram :)
> > > >
> > > > > Date: Mon, 24 Dec 2012 21:57:54 +0530
> > > > > Subject: Re: Hbase Count Aggregate Function
> > > > > From: ramkrishna.s.vasudevan@gmail.com
> > > > > To: user@hbase.apache.org
> > > > >
> > > > > So you find that scan with a filter and count with the same filter is
> > > > > giving you different results?
> > > > >
> > > > > Regards
> > > > > Ram
> > > > >
> > > > > On Mon, Dec 24, 2012 at 8:33 PM, Dalia Sobhy <
> > dalia.mohsobhy@hotmail.com
> > > > >wrote:
> > > > >
> > > > > >
> > > > > > Dear all,
> > > > > >
> > > > > > I have 50,000 row with diagnosis qualifier = "cardiac", and another
> > > > 50,000
> > > > > > rows with "renal".
> > > > > >
> > > > > > When I type this in Hbase shell,
> > > > > >
> > > > > > import org.apache.hadoop.hbase.filter.CompareFilter
> > > > > > import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
> > > > > > import org.apache.hadoop.hbase.filter.SubstringComparator
> > > > > > import org.apache.hadoop.hbase.util.Bytes
> > > > > >
> > > > > > scan 'patient', { COLUMNS => "info:diagnosis", FILTER =>
> > > > > >     SingleColumnValueFilter.new(Bytes.toBytes('info'),
> > > > > >          Bytes.toBytes('diagnosis'),
> > > > > >          CompareFilter::CompareOp.valueOf('EQUAL'),
> > > > > >          SubstringComparator.new('cardiac'))}
> > > > > >
> > > > > > Output = 50,000 row
> > > > > >
> > > > > > import org.apache.hadoop.hbase.filter.CompareFilter
> > > > > > import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
> > > > > > import org.apache.hadoop.hbase.filter.SubstringComparator
> > > > > > import org.apache.hadoop.hbase.util.Bytes
> > > > > >
> > > > > > count 'patient', { COLUMNS => "info:diagnosis", FILTER =>
> > > > > >     SingleColumnValueFilter.new(Bytes.toBytes('info'),
> > > > > >          Bytes.toBytes('diagnosis'),
> > > > > >          CompareFilter::CompareOp.valueOf('EQUAL'),
> > > > > >          SubstringComparator.new('cardiac'))}
> > > > > > Output = 100,000 row
> > > > > >
> > > > > > Even though I tried it using Hbase Java API, Aggregation Client
> > > > Instance,
> > > > > > and I enabled the Coprocessor aggregation for the table.
> > > > > > rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan)
> > > > > >
> > > > > > Also when measuring the improved performance on case of adding more
> > > > nodes
> > > > > > the operation takes the same time.
> > > > > >
> > > > > > So any advice please?
> > > > > >
> > > > > > I have been throughout all this mess from a couple of weeks
> > > > > >
> > > > > > Thanks,
> > > >
> > > >
> >
> >

RE: Hbase Count Aggregate Function

Posted by Dalia Sobhy <da...@hotmail.com>.

This is my function:

public long CountByDiagnosis(String diagnosis) throws IOException
  {
    customConf.setStrings("hbase.zookeeper.quorum",hbaseZookeeperQuorum);
    customConf.setLong("hbase.rpc.timeout", 600000);
    customConf.setLong("hbase.client.scanner.caching", 1000);
    configuration = HBaseConfiguration.create(customConf);
    aggregationClient = new AggregationClient(configuration);
    
    scan.addFamily(CF);
    
    //Filter by a particular Diagnosis
    SingleColumnValueFilter filter1 = new SingleColumnValueFilter(
      CF,
      Column,
      CompareOp.EQUAL,
      Bytes.toBytes(diagnosis)
      );
    scan.setFilter(filter1);
    
    long rowCount = -1;
    //Count the number of patients suffering from cardiac diagnosis
    try {
      rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan);
    } catch (Throwable e) {
      e.printStackTrace();
    }
    return rowCount;
    
  }
 


> Date: Tue, 25 Dec 2012 00:21:14 +0530
> Subject: Re: Hbase Count Aggregate Function
> From: ramkrishna.s.vasudevan@gmail.com
> To: user@hbase.apache.org
> 
> Hi
> You could have custom filter implemented which is similar to
> FirstKeyOnlyfilter.
> Implement the filterKeyValue method such that it should match your keyvalue
> (the specific qualifier that you are looking for).
> 
> Deploy it in your cluster.  It should work.
> 
> Regards
> Ram
> 
> On Mon, Dec 24, 2012 at 10:35 PM, Dalia Sobhy <da...@hotmail.com>wrote:
> 
> >
> > So do you have a suggestion how to enable/work the filter?
> >
> > > Date: Mon, 24 Dec 2012 22:22:49 +0530
> > > Subject: Re: Hbase Count Aggregate Function
> > > From: ramkrishna.s.vasudevan@gmail.com
> > > To: user@hbase.apache.org
> > >
> > > Okie, seeing the shell script and the code I feel that while you use this
> > > counter, the user's filter is not taken into account.
> > > It adds a FirstKeyOnlyFilter and proceeds with the scan. :(.
> > >
> > > Regards
> > > Ram
> > >
> > > On Mon, Dec 24, 2012 at 10:11 PM, Dalia Sobhy <
> > dalia.mohsobhy@hotmail.com>wrote:
> > >
> > > >
> > > > yeah scan gives the correct number of rows, while count returns the
> > total
> > > > number of rows.
> > > >
> > > > Both are using the same filter, I even tried it using Java API, using
> > row
> > > > count method.
> > > >
> > > > rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan);
> > > >
> > > > I get the total number of rows not the number of rows filtered.
> > > >
> > > > So any idea ??
> > > >
> > > > Thanks Ram :)
> > > >
> > > > > Date: Mon, 24 Dec 2012 21:57:54 +0530
> > > > > Subject: Re: Hbase Count Aggregate Function
> > > > > From: ramkrishna.s.vasudevan@gmail.com
> > > > > To: user@hbase.apache.org
> > > > >
> > > > > So you find that scan with a filter and count with the same filter is
> > > > > giving you different results?
> > > > >
> > > > > Regards
> > > > > Ram
> > > > >
> > > > > On Mon, Dec 24, 2012 at 8:33 PM, Dalia Sobhy <
> > dalia.mohsobhy@hotmail.com
> > > > >wrote:
> > > > >
> > > > > >
> > > > > > Dear all,
> > > > > >
> > > > > > I have 50,000 row with diagnosis qualifier = "cardiac", and another
> > > > 50,000
> > > > > > rows with "renal".
> > > > > >
> > > > > > When I type this in Hbase shell,
> > > > > >
> > > > > > import org.apache.hadoop.hbase.filter.CompareFilter
> > > > > > import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
> > > > > > import org.apache.hadoop.hbase.filter.SubstringComparator
> > > > > > import org.apache.hadoop.hbase.util.Bytes
> > > > > >
> > > > > > scan 'patient', { COLUMNS => "info:diagnosis", FILTER =>
> > > > > >     SingleColumnValueFilter.new(Bytes.toBytes('info'),
> > > > > >          Bytes.toBytes('diagnosis'),
> > > > > >          CompareFilter::CompareOp.valueOf('EQUAL'),
> > > > > >          SubstringComparator.new('cardiac'))}
> > > > > >
> > > > > > Output = 50,000 row
> > > > > >
> > > > > > import org.apache.hadoop.hbase.filter.CompareFilter
> > > > > > import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
> > > > > > import org.apache.hadoop.hbase.filter.SubstringComparator
> > > > > > import org.apache.hadoop.hbase.util.Bytes
> > > > > >
> > > > > > count 'patient', { COLUMNS => "info:diagnosis", FILTER =>
> > > > > >     SingleColumnValueFilter.new(Bytes.toBytes('info'),
> > > > > >          Bytes.toBytes('diagnosis'),
> > > > > >          CompareFilter::CompareOp.valueOf('EQUAL'),
> > > > > >          SubstringComparator.new('cardiac'))}
> > > > > > Output = 100,000 row
> > > > > >
> > > > > > Even though I tried it using Hbase Java API, Aggregation Client
> > > > Instance,
> > > > > > and I enabled the Coprocessor aggregation for the table.
> > > > > > rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan)
> > > > > >
> > > > > > Also when measuring the improved performance on case of adding more
> > > > nodes
> > > > > > the operation takes the same time.
> > > > > >
> > > > > > So any advice please?
> > > > > >
> > > > > > I have been throughout all this mess from a couple of weeks
> > > > > >
> > > > > > Thanks,
> > > >
> > > >
> >
> >

Re: Hbase Count Aggregate Function

Posted by ramkrishna vasudevan <ra...@gmail.com>.

Hi
You could have custom filter implemented which is similar to
FirstKeyOnlyfilter.
Implement the filterKeyValue method such that it should match your keyvalue
(the specific qualifier that you are looking for).

Deploy it in your cluster.  It should work.

Regards
Ram

On Mon, Dec 24, 2012 at 10:35 PM, Dalia Sobhy <da...@hotmail.com>wrote:

>
> So do you have a suggestion how to enable/work the filter?
>
> > Date: Mon, 24 Dec 2012 22:22:49 +0530
> > Subject: Re: Hbase Count Aggregate Function
> > From: ramkrishna.s.vasudevan@gmail.com
> > To: user@hbase.apache.org
> >
> > Okie, seeing the shell script and the code I feel that while you use this
> > counter, the user's filter is not taken into account.
> > It adds a FirstKeyOnlyFilter and proceeds with the scan. :(.
> >
> > Regards
> > Ram
> >
> > On Mon, Dec 24, 2012 at 10:11 PM, Dalia Sobhy <
> dalia.mohsobhy@hotmail.com>wrote:
> >
> > >
> > > yeah scan gives the correct number of rows, while count returns the
> total
> > > number of rows.
> > >
> > > Both are using the same filter, I even tried it using Java API, using
> row
> > > count method.
> > >
> > > rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan);
> > >
> > > I get the total number of rows not the number of rows filtered.
> > >
> > > So any idea ??
> > >
> > > Thanks Ram :)
> > >
> > > > Date: Mon, 24 Dec 2012 21:57:54 +0530
> > > > Subject: Re: Hbase Count Aggregate Function
> > > > From: ramkrishna.s.vasudevan@gmail.com
> > > > To: user@hbase.apache.org
> > > >
> > > > So you find that scan with a filter and count with the same filter is
> > > > giving you different results?
> > > >
> > > > Regards
> > > > Ram
> > > >
> > > > On Mon, Dec 24, 2012 at 8:33 PM, Dalia Sobhy <
> dalia.mohsobhy@hotmail.com
> > > >wrote:
> > > >
> > > > >
> > > > > Dear all,
> > > > >
> > > > > I have 50,000 row with diagnosis qualifier = "cardiac", and another
> > > 50,000
> > > > > rows with "renal".
> > > > >
> > > > > When I type this in Hbase shell,
> > > > >
> > > > > import org.apache.hadoop.hbase.filter.CompareFilter
> > > > > import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
> > > > > import org.apache.hadoop.hbase.filter.SubstringComparator
> > > > > import org.apache.hadoop.hbase.util.Bytes
> > > > >
> > > > > scan 'patient', { COLUMNS => "info:diagnosis", FILTER =>
> > > > >     SingleColumnValueFilter.new(Bytes.toBytes('info'),
> > > > >          Bytes.toBytes('diagnosis'),
> > > > >          CompareFilter::CompareOp.valueOf('EQUAL'),
> > > > >          SubstringComparator.new('cardiac'))}
> > > > >
> > > > > Output = 50,000 row
> > > > >
> > > > > import org.apache.hadoop.hbase.filter.CompareFilter
> > > > > import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
> > > > > import org.apache.hadoop.hbase.filter.SubstringComparator
> > > > > import org.apache.hadoop.hbase.util.Bytes
> > > > >
> > > > > count 'patient', { COLUMNS => "info:diagnosis", FILTER =>
> > > > >     SingleColumnValueFilter.new(Bytes.toBytes('info'),
> > > > >          Bytes.toBytes('diagnosis'),
> > > > >          CompareFilter::CompareOp.valueOf('EQUAL'),
> > > > >          SubstringComparator.new('cardiac'))}
> > > > > Output = 100,000 row
> > > > >
> > > > > Even though I tried it using Hbase Java API, Aggregation Client
> > > Instance,
> > > > > and I enabled the Coprocessor aggregation for the table.
> > > > > rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan)
> > > > >
> > > > > Also when measuring the improved performance on case of adding more
> > > nodes
> > > > > the operation takes the same time.
> > > > >
> > > > > So any advice please?
> > > > >
> > > > > I have been throughout all this mess from a couple of weeks
> > > > >
> > > > > Thanks,
> > >
> > >
>
>

RE: Hbase Count Aggregate Function

Posted by Dalia Sobhy <da...@hotmail.com>.

So do you have a suggestion how to enable/work the filter?

> Date: Mon, 24 Dec 2012 22:22:49 +0530
> Subject: Re: Hbase Count Aggregate Function
> From: ramkrishna.s.vasudevan@gmail.com
> To: user@hbase.apache.org
> 
> Okie, seeing the shell script and the code I feel that while you use this
> counter, the user's filter is not taken into account.
> It adds a FirstKeyOnlyFilter and proceeds with the scan. :(.
> 
> Regards
> Ram
> 
> On Mon, Dec 24, 2012 at 10:11 PM, Dalia Sobhy <da...@hotmail.com>wrote:
> 
> >
> > yeah scan gives the correct number of rows, while count returns the total
> > number of rows.
> >
> > Both are using the same filter, I even tried it using Java API, using row
> > count method.
> >
> > rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan);
> >
> > I get the total number of rows not the number of rows filtered.
> >
> > So any idea ??
> >
> > Thanks Ram :)
> >
> > > Date: Mon, 24 Dec 2012 21:57:54 +0530
> > > Subject: Re: Hbase Count Aggregate Function
> > > From: ramkrishna.s.vasudevan@gmail.com
> > > To: user@hbase.apache.org
> > >
> > > So you find that scan with a filter and count with the same filter is
> > > giving you different results?
> > >
> > > Regards
> > > Ram
> > >
> > > On Mon, Dec 24, 2012 at 8:33 PM, Dalia Sobhy <dalia.mohsobhy@hotmail.com
> > >wrote:
> > >
> > > >
> > > > Dear all,
> > > >
> > > > I have 50,000 row with diagnosis qualifier = "cardiac", and another
> > 50,000
> > > > rows with "renal".
> > > >
> > > > When I type this in Hbase shell,
> > > >
> > > > import org.apache.hadoop.hbase.filter.CompareFilter
> > > > import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
> > > > import org.apache.hadoop.hbase.filter.SubstringComparator
> > > > import org.apache.hadoop.hbase.util.Bytes
> > > >
> > > > scan 'patient', { COLUMNS => "info:diagnosis", FILTER =>
> > > >     SingleColumnValueFilter.new(Bytes.toBytes('info'),
> > > >          Bytes.toBytes('diagnosis'),
> > > >          CompareFilter::CompareOp.valueOf('EQUAL'),
> > > >          SubstringComparator.new('cardiac'))}
> > > >
> > > > Output = 50,000 row
> > > >
> > > > import org.apache.hadoop.hbase.filter.CompareFilter
> > > > import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
> > > > import org.apache.hadoop.hbase.filter.SubstringComparator
> > > > import org.apache.hadoop.hbase.util.Bytes
> > > >
> > > > count 'patient', { COLUMNS => "info:diagnosis", FILTER =>
> > > >     SingleColumnValueFilter.new(Bytes.toBytes('info'),
> > > >          Bytes.toBytes('diagnosis'),
> > > >          CompareFilter::CompareOp.valueOf('EQUAL'),
> > > >          SubstringComparator.new('cardiac'))}
> > > > Output = 100,000 row
> > > >
> > > > Even though I tried it using Hbase Java API, Aggregation Client
> > Instance,
> > > > and I enabled the Coprocessor aggregation for the table.
> > > > rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan)
> > > >
> > > > Also when measuring the improved performance on case of adding more
> > nodes
> > > > the operation takes the same time.
> > > >
> > > > So any advice please?
> > > >
> > > > I have been throughout all this mess from a couple of weeks
> > > >
> > > > Thanks,
> >
> >

Re: Hbase Count Aggregate Function

Posted by ramkrishna vasudevan <ra...@gmail.com>.

Okie, seeing the shell script and the code I feel that while you use this
counter, the user's filter is not taken into account.
It adds a FirstKeyOnlyFilter and proceeds with the scan. :(.

Regards
Ram

On Mon, Dec 24, 2012 at 10:11 PM, Dalia Sobhy <da...@hotmail.com>wrote:

>
> yeah scan gives the correct number of rows, while count returns the total
> number of rows.
>
> Both are using the same filter, I even tried it using Java API, using row
> count method.
>
> rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan);
>
> I get the total number of rows not the number of rows filtered.
>
> So any idea ??
>
> Thanks Ram :)
>
> > Date: Mon, 24 Dec 2012 21:57:54 +0530
> > Subject: Re: Hbase Count Aggregate Function
> > From: ramkrishna.s.vasudevan@gmail.com
> > To: user@hbase.apache.org
> >
> > So you find that scan with a filter and count with the same filter is
> > giving you different results?
> >
> > Regards
> > Ram
> >
> > On Mon, Dec 24, 2012 at 8:33 PM, Dalia Sobhy <dalia.mohsobhy@hotmail.com
> >wrote:
> >
> > >
> > > Dear all,
> > >
> > > I have 50,000 row with diagnosis qualifier = "cardiac", and another
> 50,000
> > > rows with "renal".
> > >
> > > When I type this in Hbase shell,
> > >
> > > import org.apache.hadoop.hbase.filter.CompareFilter
> > > import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
> > > import org.apache.hadoop.hbase.filter.SubstringComparator
> > > import org.apache.hadoop.hbase.util.Bytes
> > >
> > > scan 'patient', { COLUMNS => "info:diagnosis", FILTER =>
> > >     SingleColumnValueFilter.new(Bytes.toBytes('info'),
> > >          Bytes.toBytes('diagnosis'),
> > >          CompareFilter::CompareOp.valueOf('EQUAL'),
> > >          SubstringComparator.new('cardiac'))}
> > >
> > > Output = 50,000 row
> > >
> > > import org.apache.hadoop.hbase.filter.CompareFilter
> > > import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
> > > import org.apache.hadoop.hbase.filter.SubstringComparator
> > > import org.apache.hadoop.hbase.util.Bytes
> > >
> > > count 'patient', { COLUMNS => "info:diagnosis", FILTER =>
> > >     SingleColumnValueFilter.new(Bytes.toBytes('info'),
> > >          Bytes.toBytes('diagnosis'),
> > >          CompareFilter::CompareOp.valueOf('EQUAL'),
> > >          SubstringComparator.new('cardiac'))}
> > > Output = 100,000 row
> > >
> > > Even though I tried it using Hbase Java API, Aggregation Client
> Instance,
> > > and I enabled the Coprocessor aggregation for the table.
> > > rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan)
> > >
> > > Also when measuring the improved performance on case of adding more
> nodes
> > > the operation takes the same time.
> > >
> > > So any advice please?
> > >
> > > I have been throughout all this mess from a couple of weeks
> > >
> > > Thanks,
>
>

RE: Hbase Count Aggregate Function

Posted by Dalia Sobhy <da...@hotmail.com>.

yeah scan gives the correct number of rows, while count returns the total number of rows. 

Both are using the same filter, I even tried it using Java API, using row count method.

rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan);

I get the total number of rows not the number of rows filtered.

So any idea ??

Thanks Ram :)

> Date: Mon, 24 Dec 2012 21:57:54 +0530
> Subject: Re: Hbase Count Aggregate Function
> From: ramkrishna.s.vasudevan@gmail.com
> To: user@hbase.apache.org
> 
> So you find that scan with a filter and count with the same filter is
> giving you different results?
> 
> Regards
> Ram
> 
> On Mon, Dec 24, 2012 at 8:33 PM, Dalia Sobhy <da...@hotmail.com>wrote:
> 
> >
> > Dear all,
> >
> > I have 50,000 row with diagnosis qualifier = "cardiac", and another 50,000
> > rows with "renal".
> >
> > When I type this in Hbase shell,
> >
> > import org.apache.hadoop.hbase.filter.CompareFilter
> > import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
> > import org.apache.hadoop.hbase.filter.SubstringComparator
> > import org.apache.hadoop.hbase.util.Bytes
> >
> > scan 'patient', { COLUMNS => "info:diagnosis", FILTER =>
> >     SingleColumnValueFilter.new(Bytes.toBytes('info'),
> >          Bytes.toBytes('diagnosis'),
> >          CompareFilter::CompareOp.valueOf('EQUAL'),
> >          SubstringComparator.new('cardiac'))}
> >
> > Output = 50,000 row
> >
> > import org.apache.hadoop.hbase.filter.CompareFilter
> > import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
> > import org.apache.hadoop.hbase.filter.SubstringComparator
> > import org.apache.hadoop.hbase.util.Bytes
> >
> > count 'patient', { COLUMNS => "info:diagnosis", FILTER =>
> >     SingleColumnValueFilter.new(Bytes.toBytes('info'),
> >          Bytes.toBytes('diagnosis'),
> >          CompareFilter::CompareOp.valueOf('EQUAL'),
> >          SubstringComparator.new('cardiac'))}
> > Output = 100,000 row
> >
> > Even though I tried it using Hbase Java API, Aggregation Client Instance,
> > and I enabled the Coprocessor aggregation for the table.
> > rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan)
> >
> > Also when measuring the improved performance on case of adding more nodes
> > the operation takes the same time.
> >
> > So any advice please?
> >
> > I have been throughout all this mess from a couple of weeks
> >
> > Thanks,

Re: Hbase Count Aggregate Function

Posted by ramkrishna vasudevan <ra...@gmail.com>.

So you find that scan with a filter and count with the same filter is
giving you different results?

Regards
Ram

On Mon, Dec 24, 2012 at 8:33 PM, Dalia Sobhy <da...@hotmail.com>wrote:

>
> Dear all,
>
> I have 50,000 row with diagnosis qualifier = "cardiac", and another 50,000
> rows with "renal".
>
> When I type this in Hbase shell,
>
> import org.apache.hadoop.hbase.filter.CompareFilter
> import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
> import org.apache.hadoop.hbase.filter.SubstringComparator
> import org.apache.hadoop.hbase.util.Bytes
>
> scan 'patient', { COLUMNS => "info:diagnosis", FILTER =>
>     SingleColumnValueFilter.new(Bytes.toBytes('info'),
>          Bytes.toBytes('diagnosis'),
>          CompareFilter::CompareOp.valueOf('EQUAL'),
>          SubstringComparator.new('cardiac'))}
>
> Output = 50,000 row
>
> import org.apache.hadoop.hbase.filter.CompareFilter
> import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
> import org.apache.hadoop.hbase.filter.SubstringComparator
> import org.apache.hadoop.hbase.util.Bytes
>
> count 'patient', { COLUMNS => "info:diagnosis", FILTER =>
>     SingleColumnValueFilter.new(Bytes.toBytes('info'),
>          Bytes.toBytes('diagnosis'),
>          CompareFilter::CompareOp.valueOf('EQUAL'),
>          SubstringComparator.new('cardiac'))}
> Output = 100,000 row
>
> Even though I tried it using Hbase Java API, Aggregation Client Instance,
> and I enabled the Coprocessor aggregation for the table.
> rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan)
>
> Also when measuring the improved performance on case of adding more nodes
> the operation takes the same time.
>
> So any advice please?
>
> I have been throughout all this mess from a couple of weeks
>
> Thanks,