You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Andrew Olson <no...@gmail.com> on 2012/10/04 22:33:43 UTC

Issue with column-counting filters accepting multiple versions of a column

It looks like the max version limit for a table or scanner is not applied
to disregard older versions, prior to counting columns within a
ColumnPaginationFilter or ColumnCountGetFilter. As a result, a Scan or Get
can ultimately retrieve fewer than the requested number of columns when
there is a sufficient number of existing columns to satisfy the request, if
multiple versions of a column have been added to a row.

A minimal test case demonstrating this behavior can be found here:
https://gist.github.com/3836132

The javadoc for Get mentions 'Only Filter.filterKeyValue(KeyValue) is
called AFTER all tests for ttl, column match, deletes and *max
versions*have been run.'; for these two filters this behavior does not
appear to be
true, as flattening of multiple versions appears to occur after the filter
has been applied.

Should this be considered a bug? If so, are there any possible workarounds
besides implementing and deploying a custom Filter class?

thanks,
Andrew

Re: Issue with column-counting filters accepting multiple versions of a column

Posted by lars hofhansl <lh...@yahoo.com>.
Filters are applied before the version counting is performed.
This is a frequent area of contention. If filters were applied after the version counting other folks would complain (and have complained - in the early days filter were in fact evaluated after the version counting - which is why it was changed) for other reasons.

Unless we allow a filter to declare whether it needs be run before or after the version counting, we will always have an unhappy party :(
(I started thinking about this in HBASE-5257 but abandoned that for lack of interest)


-- Lars



________________________________
 From: Andrew Olson <no...@gmail.com>
To: user@hbase.apache.org 
Sent: Thursday, October 4, 2012 1:33 PM
Subject: Issue with column-counting filters accepting multiple versions of a column
 
It looks like the max version limit for a table or scanner is not applied
to disregard older versions, prior to counting columns within a
ColumnPaginationFilter or ColumnCountGetFilter. As a result, a Scan or Get
can ultimately retrieve fewer than the requested number of columns when
there is a sufficient number of existing columns to satisfy the request, if
multiple versions of a column have been added to a row.

A minimal test case demonstrating this behavior can be found here:
https://gist.github.com/3836132

The javadoc for Get mentions 'Only Filter.filterKeyValue(KeyValue) is
called AFTER all tests for ttl, column match, deletes and *max
versions*have been run.'; for these two filters this behavior does not
appear to be
true, as flattening of multiple versions appears to occur after the filter
has been applied.

Should this be considered a bug? If so, are there any possible workarounds
besides implementing and deploying a custom Filter class?

thanks,
Andrew

RE: Issue with column-counting filters accepting multiple versions of a column

Posted by "Ramkrishna.S.Vasudevan" <ra...@huawei.com>.
Seems to be a bug to me.  Can you file a JIRA on this?

Regards
Ram

> -----Original Message-----
> From: Andrew Olson [mailto:noslowerdna@gmail.com]
> Sent: Friday, October 05, 2012 2:04 AM
> To: user@hbase.apache.org
> Subject: Issue with column-counting filters accepting multiple versions
> of a column
> 
> It looks like the max version limit for a table or scanner is not
> applied
> to disregard older versions, prior to counting columns within a
> ColumnPaginationFilter or ColumnCountGetFilter. As a result, a Scan or
> Get
> can ultimately retrieve fewer than the requested number of columns when
> there is a sufficient number of existing columns to satisfy the
> request, if
> multiple versions of a column have been added to a row.
> 
> A minimal test case demonstrating this behavior can be found here:
> https://gist.github.com/3836132
> 
> The javadoc for Get mentions 'Only Filter.filterKeyValue(KeyValue) is
> called AFTER all tests for ttl, column match, deletes and *max
> versions*have been run.'; for these two filters this behavior does not
> appear to be
> true, as flattening of multiple versions appears to occur after the
> filter
> has been applied.
> 
> Should this be considered a bug? If so, are there any possible
> workarounds
> besides implementing and deploying a custom Filter class?
> 
> thanks,
> Andrew


Re: Issue with column-counting filters accepting multiple versions of a column

Posted by Andrew Olson <no...@gmail.com>.
Jira filed: https://issues.apache.org/jira/browse/HBASE-6954