You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@accumulo.apache.org by Keith Turner <ke...@deenlo.com> on 2012/03/08 00:22:36 UTC

Time based locality groups

We regularly have questions from users about querying new data and
aging off old data.  I was thinking about how we could better support
this in need in 1.5.  One thing that occurred to me is having locality
groups that were based on timestamp instead of column family.  For
example a locality group for each month.   Alternatively we could have
group for < day old, < week old, < month old, < year old.  Would need
a way for users to define these.

This would make scanning a table for recent data much faster.  Also
dropping old data could be made much faster by just dropping entire
locality groups at compaction time.

One thing that irks me about this is : Should column family and time
based locality groups be mutually exclusive (i.e. an RFile has one or
the other, not both)?  If they are not then order of which is
partitioned first is important for query performance and would
probably need to be user configurable.

Thoughts?

Keith

Re: Time based locality groups

Posted by Billie J Rinaldi <bi...@ugov.gov>.

On Thursday, March 8, 2012 12:08:58 PM, "Adam Fuchs" <ad...@ugov.gov> wrote:
> On Thu, Mar 8, 2012 at 11:42 AM, Billie J Rinaldi
> <billie.j.rinaldi@ugov.gov
> > wrote:
> 
> >
> >
> > There are iterators that change the column family filter set, so I'm
> > wary
> > of automatically deciding which iterators can be pulled down into
> > the file.
> >
> 
> Do you have specific examples of iterators that change the column
> family
> filter set? I think those are going to be the hard cases that we
> should
> consider.

The specific examples I've thought of so far are the IndexedDocIterator, the ChunkCombiner, and possibly the RowFilter, but generally anything that has to read more data than what is brought back to decide what to bring back.

Billie


> If we think of this as purely an optimization with no changes to
> correctness, then we can limit the automatic decisions to those things
> that
> we understand well. Modifications to the API can help us cut out a
> bunch of
> behaviors that we don't understand and are unnecessary and rarely
> used. If
> those things that we do understand well cover most of the cases in
> practice
> then this is probably a good approach, but I'd really like to see
> things
> that don't fit the model.
> 
> Adam

Re: Time based locality groups

Posted by Adam Fuchs <ad...@ugov.gov>.

On Thu, Mar 8, 2012 at 11:42 AM, Billie J Rinaldi <billie.j.rinaldi@ugov.gov
> wrote:

>
>
> There are iterators that change the column family filter set, so I'm wary
> of automatically deciding which iterators can be pulled down into the file.
>

Do you have specific examples of iterators that change the column family
filter set? I think those are going to be the hard cases that we should
consider.

If we think of this as purely an optimization with no changes to
correctness, then we can limit the automatic decisions to those things that
we understand well. Modifications to the API can help us cut out a bunch of
behaviors that we don't understand and are unnecessary and rarely used. If
those things that we do understand well cover most of the cases in practice
then this is probably a good approach, but I'd really like to see things
that don't fit the model.

Adam

Re: Time based locality groups

Posted by Billie J Rinaldi <bi...@ugov.gov>.

On Thursday, March 8, 2012 7:01:49 AM, "Adam Fuchs" <ad...@ugov.gov> wrote:
> Yes, yes, yes, this is going to be a very useful feature set! (I told
> Andie
> all about it and she agreed whole-heartedly)
> 
> I think that step one needs to be figuring out how to expose this in
> the
> API, and the iterator interface is the place to start. Once we have
> defined
> an abstraction layer, we can experiment with lots of different
> implementations at the RFile layer. If we are going to broadly extend
> these
> locality group-type filtering optimizations, it might make sense to
> drop
> the specialization for column family filtering that is part of the
> SortedKeyValueIterator seek method. Then we could support column
> family
> filtering, timestamp filtering, cell-level security filtering, etc. as
> separate iterators. The specialization for column family filtering is
> our
> current mechanism for optimizing that operation in the RFile, but we
> could
> be a little smarter about how we do this.
> 
> What I'm suggesting is that when we construct an iterator tree we look
> for
> iterators on top of the RFile reader that we can collapse and
> implement as
> part of the RFile reader. So, if a column family filtering iterator is
> on
> top of the RFile then we can grab its set of column families and
> replace it
> with the filtered RFile reader. If we add a little knowledge about
> commutativity of iterators then we can even collapse filters that are
> not
> directly on top of the RFile reader (like there might be a merging
> iterator
> between the RFile reader and the column family filtering iterator).
> One way
> we could implement this is by changing the factory method that
> generates
> iterators. When this method calls the init method on a newly
> constructed
> iterator it can instead push that iterator down through the tree and
> return
> the source iterator instead. We might be able to specialize the
> iterator
> environment to signal the optimization and avoid any changes to the
> API
> here.
> 
> Once we get to the point of optimizing the RFile, I think what we
> might
> find is that the RFile entries are naturally grouped by time into
> blocks in
> many cases. A simple timestamp-based block filter might be optimal in
> these
> cases. This is what I was talking about with introducing extra
> features
> (timestamp ranges, etc) into the RFile index. I think it also makes
> sense
> to include some aggregate cell-level security markings here.
> 
> One other thing to think about: I like the simpler iterator interface,
> but
> there are some implications to modifying the column family filter set
> during a query that might be tricky. Does anybody change the column
> family
> set mid-query now, anyway? Is that something we would want to support
> for
> timestamps or other filters?

There are iterators that change the column family filter set, so I'm wary of automatically deciding which iterators can be pulled down into the file.

Billie

Re: Time based locality groups

Posted by Adam Fuchs <ad...@ugov.gov>.

Yes, yes, yes, this is going to be a very useful feature set! (I told Andie
all about it and she agreed whole-heartedly)

I think that step one needs to be figuring out how to expose this in the
API, and the iterator interface is the place to start. Once we have defined
an abstraction layer, we can experiment with lots of different
implementations at the RFile layer. If we are going to broadly extend these
locality group-type filtering optimizations, it might make sense to drop
the specialization for column family filtering that is part of the
SortedKeyValueIterator seek method. Then we could support column family
filtering, timestamp filtering, cell-level security filtering, etc. as
separate iterators. The specialization for column family filtering is our
current mechanism for optimizing that operation in the RFile, but we could
be a little smarter about how we do this.

What I'm suggesting is that when we construct an iterator tree we look for
iterators on top of the RFile reader that we can collapse and implement as
part of the RFile reader. So, if a column family filtering iterator is on
top of the RFile then we can grab its set of column families and replace it
with the filtered RFile reader. If we add a little knowledge about
commutativity of iterators then we can even collapse filters that are not
directly on top of the RFile reader (like there might be a merging iterator
between the RFile reader and the column family filtering iterator). One way
we could implement this is by changing the factory method that generates
iterators. When this method calls the init method on a newly constructed
iterator it can instead push that iterator down through the tree and return
the source iterator instead. We might be able to specialize the iterator
environment to signal the optimization and avoid any changes to the API
here.

Once we get to the point of optimizing the RFile, I think what we might
find is that the RFile entries are naturally grouped by time into blocks in
many cases. A simple timestamp-based block filter might be optimal in these
cases. This is what I was talking about with introducing extra features
(timestamp ranges, etc) into the RFile index. I think it also makes sense
to include some aggregate cell-level security markings here.

One other thing to think about: I like the simpler iterator interface, but
there are some implications to modifying the column family filter set
during a query that might be tricky. Does anybody change the column family
set mid-query now, anyway? Is that something we would want to support for
timestamps or other filters?

Adam

On Wed, Mar 7, 2012 at 9:05 PM, Keith Turner <ke...@deenlo.com> wrote:

> I was thinking of something like the following.   A locality group in
> an rfile would be comprised of arbitrary locality group metadata and
> key value pairs.
>
> interface Partitioner {
>
>    void init(LocalityGroupConfig lgc);
>
>    //method for determining what locality groups a compaction should
> create in the output RFile
>    //this method recieves metadata about locality groups in the files
> being compacted
>     List<LocalityGroupInfo>
> getLocalityGroupsToCreate(List<LocalityGroupMetadata> lgml);
>
>    //the following three methods are used to write data into a new
> RFile locality group
>    void startLocalityGroup(LocalityGroupInfo lgi);
>    //all data is passed throug this method it serves two purposes
> decide if data even goes in a locality group
>    //and for the data that is accepted build up the metadata for the
> locality group being created
>    boolean acceptKeyValue(Key k, Value v);
>    //once all data is written ask for the metadata and write that to the
> RFile
>    LocalityGroupMetadata finishLocalityGroup();
>
>    //method to select which locality groups in a RFile should be read
> by a scan or compaction
>    //this method is passed info about the existing locality groups in an
> RFile
>    List<LocalityGroupInfo>
> getLocalityGroupsToRead(List<LocalityGroupMetadata> lgml, ScanOptions
> so);
> }
>
> Keith
>
> On Wed, Mar 7, 2012 at 7:39 PM, Eric Newton <er...@gmail.com> wrote:
> > Something like this:
> >
> >    partition, meta = partitioner.choose(key, value, meta)
> >
> > The partition can be a string, which is used to look up the partitions'
> > configuration.  The meta information can be used by queries to avoid
> > including files from the partition in queries.  The metadata would be
> saved
> > at the close of the file.
> >
> > During a query, files could be filtered based on some arbitrary query
> data:
> >
> >    files = partitioner.selectFiles(files, query)
> >
> > I like it! It might also be nice to indicate some sort of "estimated"
> > percent of keys processed, and the type of compaction occurring (flush,
> > partial, everything):
> >
> >    partition, meta = partitioner.choose(key, value, meta, percent,
> > compactionType)
> >
> > Is there any other tablet-level information we might want to provide to a
> > partitioner?  Perhaps the source partition of the key/value?
> >
> > -Eric
> >
> > On Wed, Mar 7, 2012 at 6:54 PM, Keith Turner <ke...@deenlo.com> wrote:
> >
> >> Replying to myself :)
> >>
> >> The more I think about this, it seems that locality groups could
> >> handled by plugins that can parition the data and select locality
> >> groups in any way it likes. Want locality groups based on row suffix,
> >> go ahead and write the plugin.
> >>
> >> The plugin would be used for compaction time partitioning and scan
> >> time locality group selection.   User could pass options to the
> >> locality group plugin at scan time just like options are passed to
> >> iterators.    Maybe this is an extension or further generalization of
> >> the existing iterator framework, I have not thought through that far
> >> enough.
> >>
> >> Keith
> >>
> >> On Wed, Mar 7, 2012 at 6:22 PM, Keith Turner <ke...@deenlo.com> wrote:
> >> > We regularly have questions from users about querying new data and
> >> > aging off old data.  I was thinking about how we could better support
> >> > this in need in 1.5.  One thing that occurred to me is having locality
> >> > groups that were based on timestamp instead of column family.  For
> >> > example a locality group for each month.   Alternatively we could have
> >> > group for < day old, < week old, < month old, < year old.  Would need
> >> > a way for users to define these.
> >> >
> >> > This would make scanning a table for recent data much faster.  Also
> >> > dropping old data could be made much faster by just dropping entire
> >> > locality groups at compaction time.
> >> >
> >> > One thing that irks me about this is : Should column family and time
> >> > based locality groups be mutually exclusive (i.e. an RFile has one or
> >> > the other, not both)?  If they are not then order of which is
> >> > partitioned first is important for query performance and would
> >> > probably need to be user configurable.
> >> >
> >> > Thoughts?
> >> >
> >> > Keith
> >>
>

Re: Time based locality groups

Posted by Keith Turner <ke...@deenlo.com>.

I was thinking of something like the following.   A locality group in
an rfile would be comprised of arbitrary locality group metadata and
key value pairs.

interface Partitioner {

    void init(LocalityGroupConfig lgc);

    //method for determining what locality groups a compaction should
create in the output RFile
    //this method recieves metadata about locality groups in the files
being compacted
     List<LocalityGroupInfo>
getLocalityGroupsToCreate(List<LocalityGroupMetadata> lgml);

    //the following three methods are used to write data into a new
RFile locality group
    void startLocalityGroup(LocalityGroupInfo lgi);
    //all data is passed throug this method it serves two purposes
decide if data even goes in a locality group
    //and for the data that is accepted build up the metadata for the
locality group being created
    boolean acceptKeyValue(Key k, Value v);
    //once all data is written ask for the metadata and write that to the RFile
    LocalityGroupMetadata finishLocalityGroup();

    //method to select which locality groups in a RFile should be read
by a scan or compaction
    //this method is passed info about the existing locality groups in an RFile
    List<LocalityGroupInfo>
getLocalityGroupsToRead(List<LocalityGroupMetadata> lgml, ScanOptions
so);
}

Keith

On Wed, Mar 7, 2012 at 7:39 PM, Eric Newton <er...@gmail.com> wrote:
> Something like this:
>
>    partition, meta = partitioner.choose(key, value, meta)
>
> The partition can be a string, which is used to look up the partitions'
> configuration.  The meta information can be used by queries to avoid
> including files from the partition in queries.  The metadata would be saved
> at the close of the file.
>
> During a query, files could be filtered based on some arbitrary query data:
>
>    files = partitioner.selectFiles(files, query)
>
> I like it! It might also be nice to indicate some sort of "estimated"
> percent of keys processed, and the type of compaction occurring (flush,
> partial, everything):
>
>    partition, meta = partitioner.choose(key, value, meta, percent,
> compactionType)
>
> Is there any other tablet-level information we might want to provide to a
> partitioner?  Perhaps the source partition of the key/value?
>
> -Eric
>
> On Wed, Mar 7, 2012 at 6:54 PM, Keith Turner <ke...@deenlo.com> wrote:
>
>> Replying to myself :)
>>
>> The more I think about this, it seems that locality groups could
>> handled by plugins that can parition the data and select locality
>> groups in any way it likes. Want locality groups based on row suffix,
>> go ahead and write the plugin.
>>
>> The plugin would be used for compaction time partitioning and scan
>> time locality group selection.   User could pass options to the
>> locality group plugin at scan time just like options are passed to
>> iterators.    Maybe this is an extension or further generalization of
>> the existing iterator framework, I have not thought through that far
>> enough.
>>
>> Keith
>>
>> On Wed, Mar 7, 2012 at 6:22 PM, Keith Turner <ke...@deenlo.com> wrote:
>> > We regularly have questions from users about querying new data and
>> > aging off old data.  I was thinking about how we could better support
>> > this in need in 1.5.  One thing that occurred to me is having locality
>> > groups that were based on timestamp instead of column family.  For
>> > example a locality group for each month.   Alternatively we could have
>> > group for < day old, < week old, < month old, < year old.  Would need
>> > a way for users to define these.
>> >
>> > This would make scanning a table for recent data much faster.  Also
>> > dropping old data could be made much faster by just dropping entire
>> > locality groups at compaction time.
>> >
>> > One thing that irks me about this is : Should column family and time
>> > based locality groups be mutually exclusive (i.e. an RFile has one or
>> > the other, not both)?  If they are not then order of which is
>> > partitioned first is important for query performance and would
>> > probably need to be user configurable.
>> >
>> > Thoughts?
>> >
>> > Keith
>>

Re: Time based locality groups

Posted by Eric Newton <er...@gmail.com>.

Something like this:

    partition, meta = partitioner.choose(key, value, meta)

The partition can be a string, which is used to look up the partitions'
configuration.  The meta information can be used by queries to avoid
including files from the partition in queries.  The metadata would be saved
at the close of the file.

During a query, files could be filtered based on some arbitrary query data:

    files = partitioner.selectFiles(files, query)

I like it! It might also be nice to indicate some sort of "estimated"
percent of keys processed, and the type of compaction occurring (flush,
partial, everything):

    partition, meta = partitioner.choose(key, value, meta, percent,
compactionType)

Is there any other tablet-level information we might want to provide to a
partitioner?  Perhaps the source partition of the key/value?

-Eric

On Wed, Mar 7, 2012 at 6:54 PM, Keith Turner <ke...@deenlo.com> wrote:

> Replying to myself :)
>
> The more I think about this, it seems that locality groups could
> handled by plugins that can parition the data and select locality
> groups in any way it likes. Want locality groups based on row suffix,
> go ahead and write the plugin.
>
> The plugin would be used for compaction time partitioning and scan
> time locality group selection.   User could pass options to the
> locality group plugin at scan time just like options are passed to
> iterators.    Maybe this is an extension or further generalization of
> the existing iterator framework, I have not thought through that far
> enough.
>
> Keith
>
> On Wed, Mar 7, 2012 at 6:22 PM, Keith Turner <ke...@deenlo.com> wrote:
> > We regularly have questions from users about querying new data and
> > aging off old data.  I was thinking about how we could better support
> > this in need in 1.5.  One thing that occurred to me is having locality
> > groups that were based on timestamp instead of column family.  For
> > example a locality group for each month.   Alternatively we could have
> > group for < day old, < week old, < month old, < year old.  Would need
> > a way for users to define these.
> >
> > This would make scanning a table for recent data much faster.  Also
> > dropping old data could be made much faster by just dropping entire
> > locality groups at compaction time.
> >
> > One thing that irks me about this is : Should column family and time
> > based locality groups be mutually exclusive (i.e. an RFile has one or
> > the other, not both)?  If they are not then order of which is
> > partitioned first is important for query performance and would
> > probably need to be user configurable.
> >
> > Thoughts?
> >
> > Keith
>

Re: Time based locality groups

Posted by Keith Turner <ke...@deenlo.com>.

Replying to myself :)

The more I think about this, it seems that locality groups could
handled by plugins that can parition the data and select locality
groups in any way it likes. Want locality groups based on row suffix,
go ahead and write the plugin.

The plugin would be used for compaction time partitioning and scan
time locality group selection.   User could pass options to the
locality group plugin at scan time just like options are passed to
iterators.    Maybe this is an extension or further generalization of
the existing iterator framework, I have not thought through that far
enough.

Keith

On Wed, Mar 7, 2012 at 6:22 PM, Keith Turner <ke...@deenlo.com> wrote:
> We regularly have questions from users about querying new data and
> aging off old data.  I was thinking about how we could better support
> this in need in 1.5.  One thing that occurred to me is having locality
> groups that were based on timestamp instead of column family.  For
> example a locality group for each month.   Alternatively we could have
> group for < day old, < week old, < month old, < year old.  Would need
> a way for users to define these.
>
> This would make scanning a table for recent data much faster.  Also
> dropping old data could be made much faster by just dropping entire
> locality groups at compaction time.
>
> One thing that irks me about this is : Should column family and time
> based locality groups be mutually exclusive (i.e. an RFile has one or
> the other, not both)?  If they are not then order of which is
> partitioned first is important for query performance and would
> probably need to be user configurable.
>
> Thoughts?
>
> Keith