You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Dalia Sobhy <da...@hotmail.com> on 2013/01/01 22:44:47 UTC

RE: Hbase Count Aggregate Function

Thanks Ram,

Issue is resolved i forgot to add
scan.addFilter(fliterlist);

Thats why it was not filtering !!!

> Date: Wed, 26 Dec 2012 21:11:32 +0530
> Subject: Re: Hbase Count Aggregate Function
> From: ramkrishna.s.vasudevan@gmail.com
> To: user@hbase.apache.org
> 
> Dalia,
> 
> I tried out this eg,
> 
> {code}
>   private static final byte[] TEST_TABLE = Bytes.toBytes("TestTable");
>   private static final byte[] TEST_FAMILY = Bytes.toBytes("TestFamily");
>   private static final byte[] TEST_QUALIFIER =
> Bytes.toBytes("TestQualifier");
>   private static final byte[] TEST_MULTI_CQ = Bytes.toBytes("TestMultiCQ");
> 
>   private static byte[] ROW = Bytes.toBytes("testRow");
>   private static final int ROWSIZE = 20;
>   private static final int rowSeperator1 = 5;
>   private static final int rowSeperator2 = 12;
>   private static byte[][] ROWS = makeN(ROW, ROWSIZE);
> for (int i = 0; i < ROWSIZE; i++) {
>       Put put = new Put(ROWS[i]);
>       put.setWriteToWAL(false);
>       Long l = new Long(i);
>       put.add(TEST_FAMILY, TEST_QUALIFIER, Bytes.toBytes(l));
>       table.put(put);
>       Put p2 = new Put(ROWS[i]);
>       put.setWriteToWAL(false);
>       p2.add(TEST_FAMILY, Bytes.add(TEST_MULTI_CQ, Bytes.toBytes(l)), Bytes
>           .toBytes(l * 10));
>       table.put(p2);
> 
>    AggregationClient aClient = new AggregationClient(conf);
>     Scan scan = new Scan();
>     scan.addColumn(TEST_FAMILY, TEST_QUALIFIER);
>     final ColumnInterpreter<Long, Long> ci = new LongColumnInterpreter();
>     SingleColumnValueFilter scvf = new SingleColumnValueFilter(TEST_FAMILY,
> TEST_QUALIFIER, CompareOp.EQUAL,
>           Bytes.toBytes(4l));
>     scan.setFilter(scvf);
>     long rowCount = aClient.rowCount(TEST_TABLE, ci,
>         scan);
>     assertEquals(ROWSIZE, rowCount);
>     }
> {code}
> 
> So this assertion is failing and it is working as expected.  If you want to
> try out check out the testcase
> in TestAggregateProtocol.testRowCountAllTable().
> Just modify the testcase so that you pass a SingleColumnValueFilter.  It is
> working fine.
> 
> Please check and let me know.  May be am doing some mistake.
> 
> Regards
> Ram
> 
> On Tue, Dec 25, 2012 at 11:25 PM, Dalia Sobhy <da...@hotmail.com>wrote:
> 
> >
> > Is there a problem in letting ID (rowkey) "int" value??
> >
> > > Date: Tue, 25 Dec 2012 22:44:00 +0530
> > > Subject: Re: Hbase Count Aggregate Function
> > > From: ramkrishna.s.vasudevan@gmail.com
> > > To: user@hbase.apache.org
> > >
> > > @Dalia
> > >
> > > I think the aggregation client should work with what you have passed.
> >  What
> > > i meant in the previous mail was with table.count() and now with
> > > AggregationClient.
> > > {code}
> > > if (scan.getFilter() == null && qualifier == null)
> > >       scan.setFilter(new FirstKeyOnlyFilter());
> > > {code}
> > >
> > > So as you have passed the filter then it should work as how the SCVF
> > should
> > > work.  I can check this out during free time (may be tomorrow).
> > > If not you can raise a bug.  If it turns to be fine then we can close it
> > > out otherwise its better we fix it.
> > > I can understand your urgency in this.
> > >
> > > Regards
> > > Ram
> > >
> > >
> > >
> > >
> > >
> > > On Tue, Dec 25, 2012 at 10:27 PM, <yu...@gmail.com> wrote:
> > >
> > > > RowCount method accepts scan object where you can attach your custom
> > > > filter.
> > > >
> > > > Cheers
> > > >
> > > >
> > > >
> > > > On Dec 25, 2012, at 8:42 AM, Dalia Sobhy <da...@hotmail.com>
> > > > wrote:
> > > >
> > > > >
> > > > > Do you mean I implement a new rowCount method in Aggregation Client
> > > > Class.
> > > > >
> > > > > I cannot understand, could u illustrate with a code sample Ram?
> > > > >
> > > > >>> Date: Tue, 25 Dec 2012 00:21:14 +0530
> > > > >>> Subject: Re: Hbase Count Aggregate Function
> > > > >>> From: ramkrishna.s.vasudevan@gmail.com
> > > > >>> To: user@hbase.apache.org
> > > > >>>
> > > > >>> Hi
> > > > >>> You could have custom filter implemented which is similar to
> > > > >>> FirstKeyOnlyfilter.
> > > > >>> Implement the filterKeyValue method such that it should match your
> > > > keyvalue
> > > > >>> (the specific qualifier that you are looking for).
> > > > >>>
> > > > >>> Deploy it in your cluster.  It should work.
> > > > >>>
> > > > >>> Regards
> > > > >>> Ram
> > > > >>>
> > > > >>> On Mon, Dec 24, 2012 at 10:35 PM, Dalia Sobhy <
> > > > dalia.mohsobhy@hotmail.com>wrote:
> > > > >>>
> > > > >>>>
> > > > >>>> So do you have a suggestion how to enable/work the filter?
> > > > >>>>
> > > > >>>>> Date: Mon, 24 Dec 2012 22:22:49 +0530
> > > > >>>>> Subject: Re: Hbase Count Aggregate Function
> > > > >>>>> From: ramkrishna.s.vasudevan@gmail.com
> > > > >>>>> To: user@hbase.apache.org
> > > > >>>>>
> > > > >>>>> Okie, seeing the shell script and the code I feel that while you
> > use
> > > > this
> > > > >>>>> counter, the user's filter is not taken into account.
> > > > >>>>> It adds a FirstKeyOnlyFilter and proceeds with the scan. :(.
> > > > >>>>>
> > > > >>>>> Regards
> > > > >>>>> Ram
> > > > >>>>>
> > > > >>>>> On Mon, Dec 24, 2012 at 10:11 PM, Dalia Sobhy <
> > > > >>>> dalia.mohsobhy@hotmail.com>wrote:
> > > > >>>>>
> > > > >>>>>>
> > > > >>>>>> yeah scan gives the correct number of rows, while count returns
> > the
> > > > >>>> total
> > > > >>>>>> number of rows.
> > > > >>>>>>
> > > > >>>>>> Both are using the same filter, I even tried it using Java API,
> > > > using
> > > > >>>> row
> > > > >>>>>> count method.
> > > > >>>>>>
> > > > >>>>>> rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan);
> > > > >>>>>>
> > > > >>>>>> I get the total number of rows not the number of rows filtered.
> > > > >>>>>>
> > > > >>>>>> So any idea ??
> > > > >>>>>>
> > > > >>>>>> Thanks Ram :)
> > > > >>>>>>
> > > > >>>>>>> Date: Mon, 24 Dec 2012 21:57:54 +0530
> > > > >>>>>>> Subject: Re: Hbase Count Aggregate Function
> > > > >>>>>>> From: ramkrishna.s.vasudevan@gmail.com
> > > > >>>>>>> To: user@hbase.apache.org
> > > > >>>>>>>
> > > > >>>>>>> So you find that scan with a filter and count with the same
> > filter
> > > > is
> > > > >>>>>>> giving you different results?
> > > > >>>>>>>
> > > > >>>>>>> Regards
> > > > >>>>>>> Ram
> > > > >>>>>>>
> > > > >>>>>>> On Mon, Dec 24, 2012 at 8:33 PM, Dalia Sobhy <
> > > > >>>> dalia.mohsobhy@hotmail.com
> > > > >>>>>>> wrote:
> > > > >>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>> Dear all,
> > > > >>>>>>>>
> > > > >>>>>>>> I have 50,000 row with diagnosis qualifier = "cardiac", and
> > > > another
> > > > >>>>>> 50,000
> > > > >>>>>>>> rows with "renal".
> > > > >>>>>>>>
> > > > >>>>>>>> When I type this in Hbase shell,
> > > > >>>>>>>>
> > > > >>>>>>>> import org.apache.hadoop.hbase.filter.CompareFilter
> > > > >>>>>>>> import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
> > > > >>>>>>>> import org.apache.hadoop.hbase.filter.SubstringComparator
> > > > >>>>>>>> import org.apache.hadoop.hbase.util.Bytes
> > > > >>>>>>>>
> > > > >>>>>>>> scan 'patient', { COLUMNS => "info:diagnosis", FILTER =>
> > > > >>>>>>>>    SingleColumnValueFilter.new(Bytes.toBytes('info'),
> > > > >>>>>>>>         Bytes.toBytes('diagnosis'),
> > > > >>>>>>>>         CompareFilter::CompareOp.valueOf('EQUAL'),
> > > > >>>>>>>>         SubstringComparator.new('cardiac'))}
> > > > >>>>>>>>
> > > > >>>>>>>> Output = 50,000 row
> > > > >>>>>>>>
> > > > >>>>>>>> import org.apache.hadoop.hbase.filter.CompareFilter
> > > > >>>>>>>> import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
> > > > >>>>>>>> import org.apache.hadoop.hbase.filter.SubstringComparator
> > > > >>>>>>>> import org.apache.hadoop.hbase.util.Bytes
> > > > >>>>>>>>
> > > > >>>>>>>> count 'patient', { COLUMNS => "info:diagnosis", FILTER =>
> > > > >>>>>>>>    SingleColumnValueFilter.new(Bytes.toBytes('info'),
> > > > >>>>>>>>         Bytes.toBytes('diagnosis'),
> > > > >>>>>>>>         CompareFilter::CompareOp.valueOf('EQUAL'),
> > > > >>>>>>>>         SubstringComparator.new('cardiac'))}
> > > > >>>>>>>> Output = 100,000 row
> > > > >>>>>>>>
> > > > >>>>>>>> Even though I tried it using Hbase Java API, Aggregation
> > Client
> > > > >>>>>> Instance,
> > > > >>>>>>>> and I enabled the Coprocessor aggregation for the table.
> > > > >>>>>>>> rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan)
> > > > >>>>>>>>
> > > > >>>>>>>> Also when measuring the improved performance on case of adding
> > > > more
> > > > >>>>>> nodes
> > > > >>>>>>>> the operation takes the same time.
> > > > >>>>>>>>
> > > > >>>>>>>> So any advice please?
> > > > >>>>>>>>
> > > > >>>>>>>> I have been throughout all this mess from a couple of weeks
> > > > >>>>>>>>
> > > > >>>>>>>> Thanks,
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>
> > > > >>>>
> > > > >>
> > > > >
> > > >
> >
> >
 		 	   		  

Re: Hbase Count Aggregate Function

Posted by ramkrishna vasudevan <ra...@gmail.com>.
Oh...Oops..

Regards
Ram

On Wed, Jan 2, 2013 at 3:14 AM, Dalia Sobhy <da...@hotmail.com>wrote:

>
> Thanks Ram,
>
> Issue is resolved i forgot to add
> scan.addFilter(fliterlist);
>
> Thats why it was not filtering !!!
>
> > Date: Wed, 26 Dec 2012 21:11:32 +0530
> > Subject: Re: Hbase Count Aggregate Function
> > From: ramkrishna.s.vasudevan@gmail.com
> > To: user@hbase.apache.org
> >
> > Dalia,
> >
> > I tried out this eg,
> >
> > {code}
> >   private static final byte[] TEST_TABLE = Bytes.toBytes("TestTable");
> >   private static final byte[] TEST_FAMILY = Bytes.toBytes("TestFamily");
> >   private static final byte[] TEST_QUALIFIER =
> > Bytes.toBytes("TestQualifier");
> >   private static final byte[] TEST_MULTI_CQ =
> Bytes.toBytes("TestMultiCQ");
> >
> >   private static byte[] ROW = Bytes.toBytes("testRow");
> >   private static final int ROWSIZE = 20;
> >   private static final int rowSeperator1 = 5;
> >   private static final int rowSeperator2 = 12;
> >   private static byte[][] ROWS = makeN(ROW, ROWSIZE);
> > for (int i = 0; i < ROWSIZE; i++) {
> >       Put put = new Put(ROWS[i]);
> >       put.setWriteToWAL(false);
> >       Long l = new Long(i);
> >       put.add(TEST_FAMILY, TEST_QUALIFIER, Bytes.toBytes(l));
> >       table.put(put);
> >       Put p2 = new Put(ROWS[i]);
> >       put.setWriteToWAL(false);
> >       p2.add(TEST_FAMILY, Bytes.add(TEST_MULTI_CQ, Bytes.toBytes(l)),
> Bytes
> >           .toBytes(l * 10));
> >       table.put(p2);
> >
> >    AggregationClient aClient = new AggregationClient(conf);
> >     Scan scan = new Scan();
> >     scan.addColumn(TEST_FAMILY, TEST_QUALIFIER);
> >     final ColumnInterpreter<Long, Long> ci = new LongColumnInterpreter();
> >     SingleColumnValueFilter scvf = new
> SingleColumnValueFilter(TEST_FAMILY,
> > TEST_QUALIFIER, CompareOp.EQUAL,
> >           Bytes.toBytes(4l));
> >     scan.setFilter(scvf);
> >     long rowCount = aClient.rowCount(TEST_TABLE, ci,
> >         scan);
> >     assertEquals(ROWSIZE, rowCount);
> >     }
> > {code}
> >
> > So this assertion is failing and it is working as expected.  If you want
> to
> > try out check out the testcase
> > in TestAggregateProtocol.testRowCountAllTable().
> > Just modify the testcase so that you pass a SingleColumnValueFilter.  It
> is
> > working fine.
> >
> > Please check and let me know.  May be am doing some mistake.
> >
> > Regards
> > Ram
> >
> > On Tue, Dec 25, 2012 at 11:25 PM, Dalia Sobhy <
> dalia.mohsobhy@hotmail.com>wrote:
> >
> > >
> > > Is there a problem in letting ID (rowkey) "int" value??
> > >
> > > > Date: Tue, 25 Dec 2012 22:44:00 +0530
> > > > Subject: Re: Hbase Count Aggregate Function
> > > > From: ramkrishna.s.vasudevan@gmail.com
> > > > To: user@hbase.apache.org
> > > >
> > > > @Dalia
> > > >
> > > > I think the aggregation client should work with what you have passed.
> > >  What
> > > > i meant in the previous mail was with table.count() and now with
> > > > AggregationClient.
> > > > {code}
> > > > if (scan.getFilter() == null && qualifier == null)
> > > >       scan.setFilter(new FirstKeyOnlyFilter());
> > > > {code}
> > > >
> > > > So as you have passed the filter then it should work as how the SCVF
> > > should
> > > > work.  I can check this out during free time (may be tomorrow).
> > > > If not you can raise a bug.  If it turns to be fine then we can
> close it
> > > > out otherwise its better we fix it.
> > > > I can understand your urgency in this.
> > > >
> > > > Regards
> > > > Ram
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Tue, Dec 25, 2012 at 10:27 PM, <yu...@gmail.com> wrote:
> > > >
> > > > > RowCount method accepts scan object where you can attach your
> custom
> > > > > filter.
> > > > >
> > > > > Cheers
> > > > >
> > > > >
> > > > >
> > > > > On Dec 25, 2012, at 8:42 AM, Dalia Sobhy <
> dalia.mohsobhy@hotmail.com>
> > > > > wrote:
> > > > >
> > > > > >
> > > > > > Do you mean I implement a new rowCount method in Aggregation
> Client
> > > > > Class.
> > > > > >
> > > > > > I cannot understand, could u illustrate with a code sample Ram?
> > > > > >
> > > > > >>> Date: Tue, 25 Dec 2012 00:21:14 +0530
> > > > > >>> Subject: Re: Hbase Count Aggregate Function
> > > > > >>> From: ramkrishna.s.vasudevan@gmail.com
> > > > > >>> To: user@hbase.apache.org
> > > > > >>>
> > > > > >>> Hi
> > > > > >>> You could have custom filter implemented which is similar to
> > > > > >>> FirstKeyOnlyfilter.
> > > > > >>> Implement the filterKeyValue method such that it should match
> your
> > > > > keyvalue
> > > > > >>> (the specific qualifier that you are looking for).
> > > > > >>>
> > > > > >>> Deploy it in your cluster.  It should work.
> > > > > >>>
> > > > > >>> Regards
> > > > > >>> Ram
> > > > > >>>
> > > > > >>> On Mon, Dec 24, 2012 at 10:35 PM, Dalia Sobhy <
> > > > > dalia.mohsobhy@hotmail.com>wrote:
> > > > > >>>
> > > > > >>>>
> > > > > >>>> So do you have a suggestion how to enable/work the filter?
> > > > > >>>>
> > > > > >>>>> Date: Mon, 24 Dec 2012 22:22:49 +0530
> > > > > >>>>> Subject: Re: Hbase Count Aggregate Function
> > > > > >>>>> From: ramkrishna.s.vasudevan@gmail.com
> > > > > >>>>> To: user@hbase.apache.org
> > > > > >>>>>
> > > > > >>>>> Okie, seeing the shell script and the code I feel that while
> you
> > > use
> > > > > this
> > > > > >>>>> counter, the user's filter is not taken into account.
> > > > > >>>>> It adds a FirstKeyOnlyFilter and proceeds with the scan. :(.
> > > > > >>>>>
> > > > > >>>>> Regards
> > > > > >>>>> Ram
> > > > > >>>>>
> > > > > >>>>> On Mon, Dec 24, 2012 at 10:11 PM, Dalia Sobhy <
> > > > > >>>> dalia.mohsobhy@hotmail.com>wrote:
> > > > > >>>>>
> > > > > >>>>>>
> > > > > >>>>>> yeah scan gives the correct number of rows, while count
> returns
> > > the
> > > > > >>>> total
> > > > > >>>>>> number of rows.
> > > > > >>>>>>
> > > > > >>>>>> Both are using the same filter, I even tried it using Java
> API,
> > > > > using
> > > > > >>>> row
> > > > > >>>>>> count method.
> > > > > >>>>>>
> > > > > >>>>>> rowCount = aggregationClient.rowCount(TABLE_NAME, null,
> scan);
> > > > > >>>>>>
> > > > > >>>>>> I get the total number of rows not the number of rows
> filtered.
> > > > > >>>>>>
> > > > > >>>>>> So any idea ??
> > > > > >>>>>>
> > > > > >>>>>> Thanks Ram :)
> > > > > >>>>>>
> > > > > >>>>>>> Date: Mon, 24 Dec 2012 21:57:54 +0530
> > > > > >>>>>>> Subject: Re: Hbase Count Aggregate Function
> > > > > >>>>>>> From: ramkrishna.s.vasudevan@gmail.com
> > > > > >>>>>>> To: user@hbase.apache.org
> > > > > >>>>>>>
> > > > > >>>>>>> So you find that scan with a filter and count with the same
> > > filter
> > > > > is
> > > > > >>>>>>> giving you different results?
> > > > > >>>>>>>
> > > > > >>>>>>> Regards
> > > > > >>>>>>> Ram
> > > > > >>>>>>>
> > > > > >>>>>>> On Mon, Dec 24, 2012 at 8:33 PM, Dalia Sobhy <
> > > > > >>>> dalia.mohsobhy@hotmail.com
> > > > > >>>>>>> wrote:
> > > > > >>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>> Dear all,
> > > > > >>>>>>>>
> > > > > >>>>>>>> I have 50,000 row with diagnosis qualifier = "cardiac",
> and
> > > > > another
> > > > > >>>>>> 50,000
> > > > > >>>>>>>> rows with "renal".
> > > > > >>>>>>>>
> > > > > >>>>>>>> When I type this in Hbase shell,
> > > > > >>>>>>>>
> > > > > >>>>>>>> import org.apache.hadoop.hbase.filter.CompareFilter
> > > > > >>>>>>>> import
> org.apache.hadoop.hbase.filter.SingleColumnValueFilter
> > > > > >>>>>>>> import org.apache.hadoop.hbase.filter.SubstringComparator
> > > > > >>>>>>>> import org.apache.hadoop.hbase.util.Bytes
> > > > > >>>>>>>>
> > > > > >>>>>>>> scan 'patient', { COLUMNS => "info:diagnosis", FILTER =>
> > > > > >>>>>>>>    SingleColumnValueFilter.new(Bytes.toBytes('info'),
> > > > > >>>>>>>>         Bytes.toBytes('diagnosis'),
> > > > > >>>>>>>>         CompareFilter::CompareOp.valueOf('EQUAL'),
> > > > > >>>>>>>>         SubstringComparator.new('cardiac'))}
> > > > > >>>>>>>>
> > > > > >>>>>>>> Output = 50,000 row
> > > > > >>>>>>>>
> > > > > >>>>>>>> import org.apache.hadoop.hbase.filter.CompareFilter
> > > > > >>>>>>>> import
> org.apache.hadoop.hbase.filter.SingleColumnValueFilter
> > > > > >>>>>>>> import org.apache.hadoop.hbase.filter.SubstringComparator
> > > > > >>>>>>>> import org.apache.hadoop.hbase.util.Bytes
> > > > > >>>>>>>>
> > > > > >>>>>>>> count 'patient', { COLUMNS => "info:diagnosis", FILTER =>
> > > > > >>>>>>>>    SingleColumnValueFilter.new(Bytes.toBytes('info'),
> > > > > >>>>>>>>         Bytes.toBytes('diagnosis'),
> > > > > >>>>>>>>         CompareFilter::CompareOp.valueOf('EQUAL'),
> > > > > >>>>>>>>         SubstringComparator.new('cardiac'))}
> > > > > >>>>>>>> Output = 100,000 row
> > > > > >>>>>>>>
> > > > > >>>>>>>> Even though I tried it using Hbase Java API, Aggregation
> > > Client
> > > > > >>>>>> Instance,
> > > > > >>>>>>>> and I enabled the Coprocessor aggregation for the table.
> > > > > >>>>>>>> rowCount = aggregationClient.rowCount(TABLE_NAME, null,
> scan)
> > > > > >>>>>>>>
> > > > > >>>>>>>> Also when measuring the improved performance on case of
> adding
> > > > > more
> > > > > >>>>>> nodes
> > > > > >>>>>>>> the operation takes the same time.
> > > > > >>>>>>>>
> > > > > >>>>>>>> So any advice please?
> > > > > >>>>>>>>
> > > > > >>>>>>>> I have been throughout all this mess from a couple of
> weeks
> > > > > >>>>>>>>
> > > > > >>>>>>>> Thanks,
> > > > > >>>>>>
> > > > > >>>>>>
> > > > > >>>>
> > > > > >>>>
> > > > > >>
> > > > > >
> > > > >
> > >
> > >
>
>