You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Carl Austin <ca...@gmail.com> on 2014/07/01 11:43:10 UTC

Accumulo iterators in HBase

Hi,

I've recently been doing a little research into getting Accumulo iterators
working in HBase, and in my very basic example I seem to have been able to
do this for all three scopes (scan, min compaction and major compaction, or
scan, flush and compaction in HBase terminology).

I was hoping that an HBase guru would be able to take a look at my approach
- https://github.com/carlaustin/hbase-accumulo-iterators. It's very simple,
just 7 small classes.

I've done it by creating wrappers that can convert from accumulo iterators
to HBase scanners and back, allowing me to wrap a scanner as an iterator,
hand it to an accumulo iterator as the start of an iterator chain, and then
wrap that back to a scanner and return it. I've then used a RegionObserver
to implement this on flush, compact and scan.

You can see from the example I've done no iterator management or anything
at this point, it simply applies an iterator that changes all values to the
word "carl" for a table called "test". If it looks like this is a go-er
then I would look to continue work.

I'd really appreciate any comments on the approach to things I've missed,
even if they make this a total non-starter.

Thanks

Carl

Re: Accumulo iterators in HBase

Posted by Stack <st...@duboce.net>.

On Wed, Jul 2, 2014 at 12:07 AM, Carl Austin <ca...@gmail.com> wrote:

> Thanks for the time to look and comment and glad it sounds interesting,
>
> The reason I started on this was that I'm using Accumulo and want to make
> an application usable on both HBase and Accumulo with the same codebase. I
> do a lot of aggregations on data and I feel the Accumulo iterator mechanism
> is superior for this use case; it's one of the main reasons I went with
> Accumulo and one of the only remaining major differences between the two
> applications now that HBase has implemented cell level ACLs.
> For example, as I am ingesting a main table of data I am creating many
> other question focused tables that keep answers like how many times did I
> see combinations of values, when was the last time I saw combinations
> together, how many distinct values where in this field for each combination
> (using probabilistic counting of course) and many more. All of these things
> are well suited to Accumulo iterators for performance at scale because of
> how they run at compaction time across key/values that are already being
> read at that point, rather than having to update the answers to these
> questions on every single insert.
>
> This use case won't be for everyone, but the iterator mechanism is pretty
> neat, powerful and a real differentiator in Accumulo (of course there are
> many differentiators in HBase too!).
>

Thank you for sharing your experience.  I'm watching your repo.  Feel free
to ping me off-list if you want an opinion on how to hbase it or if you
want a review.

Thanks Carl,
St.Ack

Re: Accumulo iterators in HBase

Posted by Carl Austin <ca...@gmail.com>.

Thanks for the time to look and comment and glad it sounds interesting,

The reason I started on this was that I'm using Accumulo and want to make
an application usable on both HBase and Accumulo with the same codebase. I
do a lot of aggregations on data and I feel the Accumulo iterator mechanism
is superior for this use case; it's one of the main reasons I went with
Accumulo and one of the only remaining major differences between the two
applications now that HBase has implemented cell level ACLs.
For example, as I am ingesting a main table of data I am creating many
other question focused tables that keep answers like how many times did I
see combinations of values, when was the last time I saw combinations
together, how many distinct values where in this field for each combination
(using probabilistic counting of course) and many more. All of these things
are well suited to Accumulo iterators for performance at scale because of
how they run at compaction time across key/values that are already being
read at that point, rather than having to update the answers to these
questions on every single insert.

This use case won't be for everyone, but the iterator mechanism is pretty
neat, powerful and a real differentiator in Accumulo (of course there are
many differentiators in HBase too!).

Thanks

Carl

On Tue, Jul 1, 2014 at 6:57 PM, Stack <st...@duboce.net> wrote:

> Interesting project Carl.  Use Cell interface instead of KeyValue if you
> can (especially given you are copying to accumulo key/value).  What you
> thinking? What would be the use case?
> Thanks,
> St.Ack
>
>
> On Tue, Jul 1, 2014 at 2:43 AM, Carl Austin <ca...@gmail.com> wrote:
>
> > Hi,
> >
> > I've recently been doing a little research into getting Accumulo
> iterators
> > working in HBase, and in my very basic example I seem to have been able
> to
> > do this for all three scopes (scan, min compaction and major compaction,
> or
> > scan, flush and compaction in HBase terminology).
> >
> > I was hoping that an HBase guru would be able to take a look at my
> approach
> > - https://github.com/carlaustin/hbase-accumulo-iterators. It's very
> > simple,
> > just 7 small classes.
> >
> > I've done it by creating wrappers that can convert from accumulo
> iterators
> > to HBase scanners and back, allowing me to wrap a scanner as an iterator,
> > hand it to an accumulo iterator as the start of an iterator chain, and
> then
> > wrap that back to a scanner and return it. I've then used a
> RegionObserver
> > to implement this on flush, compact and scan.
> >
> > You can see from the example I've done no iterator management or anything
> > at this point, it simply applies an iterator that changes all values to
> the
> > word "carl" for a table called "test". If it looks like this is a go-er
> > then I would look to continue work.
> >
> > I'd really appreciate any comments on the approach to things I've missed,
> > even if they make this a total non-starter.
> >
> > Thanks
> >
> > Carl
> >
>

Re: Accumulo iterators in HBase

Posted by Stack <st...@duboce.net>.

Interesting project Carl.  Use Cell interface instead of KeyValue if you
can (especially given you are copying to accumulo key/value).  What you
thinking? What would be the use case?
Thanks,
St.Ack


On Tue, Jul 1, 2014 at 2:43 AM, Carl Austin <ca...@gmail.com> wrote:

> Hi,
>
> I've recently been doing a little research into getting Accumulo iterators
> working in HBase, and in my very basic example I seem to have been able to
> do this for all three scopes (scan, min compaction and major compaction, or
> scan, flush and compaction in HBase terminology).
>
> I was hoping that an HBase guru would be able to take a look at my approach
> - https://github.com/carlaustin/hbase-accumulo-iterators. It's very
> simple,
> just 7 small classes.
>
> I've done it by creating wrappers that can convert from accumulo iterators
> to HBase scanners and back, allowing me to wrap a scanner as an iterator,
> hand it to an accumulo iterator as the start of an iterator chain, and then
> wrap that back to a scanner and return it. I've then used a RegionObserver
> to implement this on flush, compact and scan.
>
> You can see from the example I've done no iterator management or anything
> at this point, it simply applies an iterator that changes all values to the
> word "carl" for a table called "test". If it looks like this is a go-er
> then I would look to continue work.
>
> I'd really appreciate any comments on the approach to things I've missed,
> even if they make this a total non-starter.
>
> Thanks
>
> Carl
>