You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Tom Brown <to...@gmail.com> on 2012/08/03 22:27:36 UTC

Need to fast-forward a scanner inside a coprocessor

I have a custom coprocessor that aggregates a selection of records
from the table based various criteria. For efficiency, I would like to
make it skip a bunch of records. For example, if I don't need any
"AAAA" records and I encounter "AAAA0000", I would like to tell it to
skip everything until "AAAB.."

I don't see any methods of the InternalScanner class that would give
me that ability. Do I need to close the current scanner and open a new
one? Does that add significant overhead (which would reduce any gains
achieved by skipping small numbers of records)?

I am using HBase 0.92. Upgrading to 0.94 is possible if it gives this
functionality.

--Tom

Re: Need to fast-forward a scanner inside a coprocessor

Posted by lars hofhansl <lh...@yahoo.com>.
Oh... I just meant you need to have your hands on a RegionScanner :)
As long as you only scan forward it should work.



----- Original Message -----
From: Tom Brown <to...@gmail.com>
To: user@hbase.apache.org; lars hofhansl <lh...@yahoo.com>
Cc: 
Sent: Friday, August 3, 2012 5:47 PM
Subject: Re: Need to fast-forward a scanner inside a coprocessor

So I understand I'll need to upgrade to 0.94 (which won't be a problem
because the releases are binary-compatible).  I see that the
RegionScanner interface contains the new method "reseek(byte[] row)".

I have a reference to a RegionScanner in my coprocessor because I'm
using: getEnvironment().getRegion().getScanner(scan).

What I don't understand is your conditional statement "it depends
specifically on where you hook this up".  I'm not doing anything with
"postScannerOpen".  Since I have an instance of a RegionScanner,
should I expect "reseek" to work, as long as I'm seeking forward? Is
the way I'm using it up compatible with how it should work?

--Tom

On Fri, Aug 3, 2012 at 3:05 PM, lars hofhansl <lh...@yahoo.com> wrote:
> We recently added a new API for that:
> RegionScanner.reseek(...). See HBASE-5520. 0.94+ only, unfortunately.
>
> So it depends specifically on where you hook this up. If you do it at RegionObserver.postScannerOpen you can reseek forward at any time.
>
>
> -- Lars
>
>
>
> ----- Original Message -----
> From: Tom Brown <to...@gmail.com>
> To: user@hbase.apache.org
> Cc:
> Sent: Friday, August 3, 2012 1:27 PM
> Subject: Need to fast-forward a scanner inside a coprocessor
>
> I have a custom coprocessor that aggregates a selection of records
> from the table based various criteria. For efficiency, I would like to
> make it skip a bunch of records. For example, if I don't need any
> "AAAA" records and I encounter "AAAA0000", I would like to tell it to
> skip everything until "AAAB.."
>
> I don't see any methods of the InternalScanner class that would give
> me that ability. Do I need to close the current scanner and open a new
> one? Does that add significant overhead (which would reduce any gains
> achieved by skipping small numbers of records)?
>
> I am using HBase 0.92. Upgrading to 0.94 is possible if it gives this
> functionality.
>
> --Tom
>


Re: Need to fast-forward a scanner inside a coprocessor

Posted by Tom Brown <to...@gmail.com>.
So I understand I'll need to upgrade to 0.94 (which won't be a problem
because the releases are binary-compatible).  I see that the
RegionScanner interface contains the new method "reseek(byte[] row)".

I have a reference to a RegionScanner in my coprocessor because I'm
using: getEnvironment().getRegion().getScanner(scan).

What I don't understand is your conditional statement "it depends
specifically on where you hook this up".  I'm not doing anything with
"postScannerOpen".  Since I have an instance of a RegionScanner,
should I expect "reseek" to work, as long as I'm seeking forward? Is
the way I'm using it up compatible with how it should work?

--Tom

On Fri, Aug 3, 2012 at 3:05 PM, lars hofhansl <lh...@yahoo.com> wrote:
> We recently added a new API for that:
> RegionScanner.reseek(...). See HBASE-5520. 0.94+ only, unfortunately.
>
> So it depends specifically on where you hook this up. If you do it at RegionObserver.postScannerOpen you can reseek forward at any time.
>
>
> -- Lars
>
>
>
> ----- Original Message -----
> From: Tom Brown <to...@gmail.com>
> To: user@hbase.apache.org
> Cc:
> Sent: Friday, August 3, 2012 1:27 PM
> Subject: Need to fast-forward a scanner inside a coprocessor
>
> I have a custom coprocessor that aggregates a selection of records
> from the table based various criteria. For efficiency, I would like to
> make it skip a bunch of records. For example, if I don't need any
> "AAAA" records and I encounter "AAAA0000", I would like to tell it to
> skip everything until "AAAB.."
>
> I don't see any methods of the InternalScanner class that would give
> me that ability. Do I need to close the current scanner and open a new
> one? Does that add significant overhead (which would reduce any gains
> achieved by skipping small numbers of records)?
>
> I am using HBase 0.92. Upgrading to 0.94 is possible if it gives this
> functionality.
>
> --Tom
>

Re: Need to fast-forward a scanner inside a coprocessor

Posted by lars hofhansl <lh...@yahoo.com>.
We recently added a new API for that:
RegionScanner.reseek(...). See HBASE-5520. 0.94+ only, unfortunately.

So it depends specifically on where you hook this up. If you do it at RegionObserver.postScannerOpen you can reseek forward at any time.


-- Lars



----- Original Message -----
From: Tom Brown <to...@gmail.com>
To: user@hbase.apache.org
Cc: 
Sent: Friday, August 3, 2012 1:27 PM
Subject: Need to fast-forward a scanner inside a coprocessor

I have a custom coprocessor that aggregates a selection of records
from the table based various criteria. For efficiency, I would like to
make it skip a bunch of records. For example, if I don't need any
"AAAA" records and I encounter "AAAA0000", I would like to tell it to
skip everything until "AAAB.."

I don't see any methods of the InternalScanner class that would give
me that ability. Do I need to close the current scanner and open a new
one? Does that add significant overhead (which would reduce any gains
achieved by skipping small numbers of records)?

I am using HBase 0.92. Upgrading to 0.94 is possible if it gives this
functionality.

--Tom