You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Jean-Marc Spaggiari <je...@spaggiari.org> on 2013/10/16 21:51:20 UTC

ColumnPaginationFilter called twice per line?

Is anyone using filters to filter version on a single row?

I look at ColumnPaginationFilter code and it's clean and very small. But on
the client side, when I ask for the 100 first version of a row/CF/C I only
get the 50 first one. If I do a scan from the shell, I get the 10 000
versions correctly. If I do a scan from the client without the filter, I
get te 10K versions.

I tried with 0.94.12.

So since it seems to not be related to the ColumnPaginationFilter code, I
will start to take a look on the RS side and see how it's called, but I'm
wondering if anyone use that or have already seens that.

Any pointer will be welcome too.

JM

Re: ColumnPaginationFilter called twice per line?

Posted by Varun Sharma <va...@pinterest.com>.
Not really - IIRC, ColumnPaginationFilter was broken prior to this fix - it
was doing some incorrect version counting and it had to do with the way
version tracking and filtering was entangled together (I forget the exact
issue).

Varun


On Fri, Oct 18, 2013 at 1:46 PM, Jean-Marc Spaggiari <
jean-marc@spaggiari.org> wrote:

> Hi Varun,
>
> Thanks for the pointer. Did you recall of any particular version related
> work around 5257? I found few simple ways to fix that, but I'm not 100%
> sure of the impacts on the other use cases. Also, I looked at the test
> cases and I think we should add more into then.
>
> Thanks,
>
> JM
>
> Le jeudi 17 octobre 2013, Varun Sharma a écrit :
>
> > There is some history in HBase 5257 - thats where the
> > INCLUDE_AND_SEEK_NEXT_COL is introduced.
> >
> >
> > On Wed, Oct 16, 2013 at 9:21 PM, Jean-Marc Spaggiari <
> > jean-marc@spaggiari.org <javascript:;>> wrote:
> >
> > > So. Here are more details.
> > >
> > > The "issue" is because ScanQueryMatcher returns
> INCLUDE_AND_SEEK_NEXT_COL
> > > and not INCLUDE for this specific case while it should be. We don't
> want
> > to
> > > seek for the next column. I still have some difficulties to understand
> > all
> > > what this code is doing but I will continue to take a look.
> > >
> > > JM
> > >
> > >
> > > 2013/10/16 Jean-Marc Spaggiari <jean-marc@spaggiari.org<javascript:;>>
> > >
> > > > Ok. Confirmed. It's called twice:
> > > >
> > > > 2013-10-16 18:45:40,819 INFO
> > > > org.apache.hadoop.hbase.filter.ColumnPaginationFilter: A 9990
> > > > 2013-10-16 18:45:40,819 INFO
> > > > org.apache.hadoop.hbase.filter.ColumnPaginationFilter: A 9990
> > > > 2013-10-16 18:45:40,819 INFO
> > > > org.apache.hadoop.hbase.filter.ColumnPaginationFilter: A 9989
> > > > 2013-10-16 18:45:40,819 INFO
> > > > org.apache.hadoop.hbase.filter.ColumnPaginationFilter: A 9989
> > > > 2013-10-16 18:45:40,819 INFO
> > > > org.apache.hadoop.hbase.filter.ColumnPaginationFilter: A 9988
> > > > 2013-10-16 18:45:40,819 INFO
> > > > org.apache.hadoop.hbase.filter.ColumnPaginationFilter: A 9988
> > > >
> > > > Method filterKeyValue is called twice per cell version. I will try to
> > > > figure why.
> > > >
> > > > JM
> > > >
> > > >
> > > >
> > > > 2013/10/16 Jean-Marc Spaggiari <jean-marc@spaggiari.org<javascript:;>
> > >
> > > >
> > > >> Is anyone using filters to filter version on a single row?
> > > >>
> > > >> I look at ColumnPaginationFilter code and it's clean and very small.
> > But
> > > >> on the client side, when I ask for the 100 first version of a
> > row/CF/C I
> > > >> only get the 50 first one. If I do a scan from the shell, I get the
> 10
> > > 000
> > > >> versions correctly. If I do a scan from the client without the
> > filter, I
> > > >> get te 10K versions.
> > > >>
> > > >> I tried with 0.94.12.
> > > >>
> > > >> So since it seems to not be related to the ColumnPaginationFilter
> > code,
> > > I
> > > >> will start to take a look on the RS side and see how it's called,
> but
> > > I'm
> > > >> wondering if anyone use that or have already seens that.
> > > >>
> > > >> Any pointer will be welcome too.
> > > >>
> > > >> JM
> > > >>
> > > >
> > > >
> > >
> >
>

Re: ColumnPaginationFilter called twice per line?

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
Hi Varun,

Thanks for the pointer. Did you recall of any particular version related
work around 5257? I found few simple ways to fix that, but I'm not 100%
sure of the impacts on the other use cases. Also, I looked at the test
cases and I think we should add more into then.

Thanks,

JM

Le jeudi 17 octobre 2013, Varun Sharma a écrit :

> There is some history in HBase 5257 - thats where the
> INCLUDE_AND_SEEK_NEXT_COL is introduced.
>
>
> On Wed, Oct 16, 2013 at 9:21 PM, Jean-Marc Spaggiari <
> jean-marc@spaggiari.org <javascript:;>> wrote:
>
> > So. Here are more details.
> >
> > The "issue" is because ScanQueryMatcher returns INCLUDE_AND_SEEK_NEXT_COL
> > and not INCLUDE for this specific case while it should be. We don't want
> to
> > seek for the next column. I still have some difficulties to understand
> all
> > what this code is doing but I will continue to take a look.
> >
> > JM
> >
> >
> > 2013/10/16 Jean-Marc Spaggiari <jean-marc@spaggiari.org <javascript:;>>
> >
> > > Ok. Confirmed. It's called twice:
> > >
> > > 2013-10-16 18:45:40,819 INFO
> > > org.apache.hadoop.hbase.filter.ColumnPaginationFilter: A 9990
> > > 2013-10-16 18:45:40,819 INFO
> > > org.apache.hadoop.hbase.filter.ColumnPaginationFilter: A 9990
> > > 2013-10-16 18:45:40,819 INFO
> > > org.apache.hadoop.hbase.filter.ColumnPaginationFilter: A 9989
> > > 2013-10-16 18:45:40,819 INFO
> > > org.apache.hadoop.hbase.filter.ColumnPaginationFilter: A 9989
> > > 2013-10-16 18:45:40,819 INFO
> > > org.apache.hadoop.hbase.filter.ColumnPaginationFilter: A 9988
> > > 2013-10-16 18:45:40,819 INFO
> > > org.apache.hadoop.hbase.filter.ColumnPaginationFilter: A 9988
> > >
> > > Method filterKeyValue is called twice per cell version. I will try to
> > > figure why.
> > >
> > > JM
> > >
> > >
> > >
> > > 2013/10/16 Jean-Marc Spaggiari <jean-marc@spaggiari.org <javascript:;>
> >
> > >
> > >> Is anyone using filters to filter version on a single row?
> > >>
> > >> I look at ColumnPaginationFilter code and it's clean and very small.
> But
> > >> on the client side, when I ask for the 100 first version of a
> row/CF/C I
> > >> only get the 50 first one. If I do a scan from the shell, I get the 10
> > 000
> > >> versions correctly. If I do a scan from the client without the
> filter, I
> > >> get te 10K versions.
> > >>
> > >> I tried with 0.94.12.
> > >>
> > >> So since it seems to not be related to the ColumnPaginationFilter
> code,
> > I
> > >> will start to take a look on the RS side and see how it's called, but
> > I'm
> > >> wondering if anyone use that or have already seens that.
> > >>
> > >> Any pointer will be welcome too.
> > >>
> > >> JM
> > >>
> > >
> > >
> >
>

Re: ColumnPaginationFilter called twice per line?

Posted by Varun Sharma <va...@pinterest.com>.
There is some history in HBase 5257 - thats where the
INCLUDE_AND_SEEK_NEXT_COL is introduced.


On Wed, Oct 16, 2013 at 9:21 PM, Jean-Marc Spaggiari <
jean-marc@spaggiari.org> wrote:

> So. Here are more details.
>
> The "issue" is because ScanQueryMatcher returns INCLUDE_AND_SEEK_NEXT_COL
> and not INCLUDE for this specific case while it should be. We don't want to
> seek for the next column. I still have some difficulties to understand all
> what this code is doing but I will continue to take a look.
>
> JM
>
>
> 2013/10/16 Jean-Marc Spaggiari <je...@spaggiari.org>
>
> > Ok. Confirmed. It's called twice:
> >
> > 2013-10-16 18:45:40,819 INFO
> > org.apache.hadoop.hbase.filter.ColumnPaginationFilter: A 9990
> > 2013-10-16 18:45:40,819 INFO
> > org.apache.hadoop.hbase.filter.ColumnPaginationFilter: A 9990
> > 2013-10-16 18:45:40,819 INFO
> > org.apache.hadoop.hbase.filter.ColumnPaginationFilter: A 9989
> > 2013-10-16 18:45:40,819 INFO
> > org.apache.hadoop.hbase.filter.ColumnPaginationFilter: A 9989
> > 2013-10-16 18:45:40,819 INFO
> > org.apache.hadoop.hbase.filter.ColumnPaginationFilter: A 9988
> > 2013-10-16 18:45:40,819 INFO
> > org.apache.hadoop.hbase.filter.ColumnPaginationFilter: A 9988
> >
> > Method filterKeyValue is called twice per cell version. I will try to
> > figure why.
> >
> > JM
> >
> >
> >
> > 2013/10/16 Jean-Marc Spaggiari <je...@spaggiari.org>
> >
> >> Is anyone using filters to filter version on a single row?
> >>
> >> I look at ColumnPaginationFilter code and it's clean and very small. But
> >> on the client side, when I ask for the 100 first version of a row/CF/C I
> >> only get the 50 first one. If I do a scan from the shell, I get the 10
> 000
> >> versions correctly. If I do a scan from the client without the filter, I
> >> get te 10K versions.
> >>
> >> I tried with 0.94.12.
> >>
> >> So since it seems to not be related to the ColumnPaginationFilter code,
> I
> >> will start to take a look on the RS side and see how it's called, but
> I'm
> >> wondering if anyone use that or have already seens that.
> >>
> >> Any pointer will be welcome too.
> >>
> >> JM
> >>
> >
> >
>

Re: ColumnPaginationFilter called twice per line?

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
So. Here are more details.

The "issue" is because ScanQueryMatcher returns INCLUDE_AND_SEEK_NEXT_COL
and not INCLUDE for this specific case while it should be. We don't want to
seek for the next column. I still have some difficulties to understand all
what this code is doing but I will continue to take a look.

JM


2013/10/16 Jean-Marc Spaggiari <je...@spaggiari.org>

> Ok. Confirmed. It's called twice:
>
> 2013-10-16 18:45:40,819 INFO
> org.apache.hadoop.hbase.filter.ColumnPaginationFilter: A 9990
> 2013-10-16 18:45:40,819 INFO
> org.apache.hadoop.hbase.filter.ColumnPaginationFilter: A 9990
> 2013-10-16 18:45:40,819 INFO
> org.apache.hadoop.hbase.filter.ColumnPaginationFilter: A 9989
> 2013-10-16 18:45:40,819 INFO
> org.apache.hadoop.hbase.filter.ColumnPaginationFilter: A 9989
> 2013-10-16 18:45:40,819 INFO
> org.apache.hadoop.hbase.filter.ColumnPaginationFilter: A 9988
> 2013-10-16 18:45:40,819 INFO
> org.apache.hadoop.hbase.filter.ColumnPaginationFilter: A 9988
>
> Method filterKeyValue is called twice per cell version. I will try to
> figure why.
>
> JM
>
>
>
> 2013/10/16 Jean-Marc Spaggiari <je...@spaggiari.org>
>
>> Is anyone using filters to filter version on a single row?
>>
>> I look at ColumnPaginationFilter code and it's clean and very small. But
>> on the client side, when I ask for the 100 first version of a row/CF/C I
>> only get the 50 first one. If I do a scan from the shell, I get the 10 000
>> versions correctly. If I do a scan from the client without the filter, I
>> get te 10K versions.
>>
>> I tried with 0.94.12.
>>
>> So since it seems to not be related to the ColumnPaginationFilter code, I
>> will start to take a look on the RS side and see how it's called, but I'm
>> wondering if anyone use that or have already seens that.
>>
>> Any pointer will be welcome too.
>>
>> JM
>>
>
>

Re: ColumnPaginationFilter called twice per line?

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
Ok. Confirmed. It's called twice:

2013-10-16 18:45:40,819 INFO
org.apache.hadoop.hbase.filter.ColumnPaginationFilter: A 9990
2013-10-16 18:45:40,819 INFO
org.apache.hadoop.hbase.filter.ColumnPaginationFilter: A 9990
2013-10-16 18:45:40,819 INFO
org.apache.hadoop.hbase.filter.ColumnPaginationFilter: A 9989
2013-10-16 18:45:40,819 INFO
org.apache.hadoop.hbase.filter.ColumnPaginationFilter: A 9989
2013-10-16 18:45:40,819 INFO
org.apache.hadoop.hbase.filter.ColumnPaginationFilter: A 9988
2013-10-16 18:45:40,819 INFO
org.apache.hadoop.hbase.filter.ColumnPaginationFilter: A 9988

Method filterKeyValue is called twice per cell version. I will try to
figure why.

JM



2013/10/16 Jean-Marc Spaggiari <je...@spaggiari.org>

> Is anyone using filters to filter version on a single row?
>
> I look at ColumnPaginationFilter code and it's clean and very small. But
> on the client side, when I ask for the 100 first version of a row/CF/C I
> only get the 50 first one. If I do a scan from the shell, I get the 10 000
> versions correctly. If I do a scan from the client without the filter, I
> get te 10K versions.
>
> I tried with 0.94.12.
>
> So since it seems to not be related to the ColumnPaginationFilter code, I
> will start to take a look on the RS side and see how it's called, but I'm
> wondering if anyone use that or have already seens that.
>
> Any pointer will be welcome too.
>
> JM
>