You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by lars hofhansl <lh...@yahoo.com> on 2011/10/22 00:49:36 UTC

Strange performance behavior of SingleValColumnFilter

We have been doing some performance testing on HBase filters. One outcome was HBASE-4626 (which I fixed and committed yesterday night).

Now we found a rather strange behavior with SingleColumnValueFilter. On our test cluster it is 10x slower than ValueFilter, even when we restrict the scan to just the one column we are filtering on and set filterIfMissing to true.
We are not seeing that with HBase in local mode, which points to some additional activity on the FS, which in HDFS would be slow compared to a local FS.


Indeed it turns out the problem goes away when we replace all NEXT_ROW with SKIP in SingleColumnValueFilter.filterKeyValue the performance is *much* better (on par with ValueFilter).


We're using something pretty close to trunk for our tests.
The tables are pretty wide, only one version of each cells (and freshly major compacted).


I do not know this part of the code that well (yet) and was wondering if somebody could chime in. Maybe this is related to HFileV2?

I do recall there was something done to optimize reseeks. Generally I would have expected NEXT_ROW to be a major performance improvement.

Any ideas, comments, pointers?

Thanks.

-- Lars

Re: Strange performance behavior of SingleColumnValueFilter

Posted by Stack <st...@duboce.net>.

Yes.  Should be off by default.
St.Ack

On Wed, Oct 26, 2011 at 10:43 AM, lars hofhansl <lh...@yahoo.com> wrote:
> Should there be an option to disable data block caching and only allow index block caching?
> For some analytical setups that might make sense.
> (obviously, the same can be achieved by setting cacheBlocks to false in every Scan object)
>
>
>
> ----- Original Message -----
> From: lars hofhansl <lh...@yahoo.com>
> To: "dev@hbase.apache.org" <de...@hbase.apache.org>; lars hofhansl <lh...@yahoo.com>
> Cc:
> Sent: Tuesday, October 25, 2011 2:22 PM
> Subject: Re: Strange performance behavior of SingleColumnValueFilter
>
> It turns out that from other tests we did we had a stray
>
>
> <property>
>     <name>hfile.block.cache.size</name>
>     <value>0</value>
> </property>
>
>
> in our config. D'oh...
>
> When we removed that, the performance of SCVF was on par with ValueFilter.
>
> Setting cacheBlocks on the Scan object had almost no affect, so this must be related
> to the caching of Index Blocks.
> NEXT_ROW forces re-reading of Index Blocks it seems, whereas SKIP does not.
>
> So in summary:
> When hfile.block.cache.size=0, returning NEXT_ROW from a ScanQueryMatcher can be significantly slower than returning SKIP.
>
> -- Lars
>
>
> ----- Original Message -----
> From: lars hofhansl <lh...@yahoo.com>
> To: "dev@hbase.apache.org" <de...@hbase.apache.org>
> Cc:
> Sent: Saturday, October 22, 2011 5:16 PM
> Subject: Re: Strange performance behavior of SingleValColumnFilter
>
> Thanks N.
>
> I do not think the time is lost in the memstore. We're working with fully compacted
> tables and do no updates during the read testing.
>
> We'll be spending more time to track this down on Monday.
>
>
> -- Lars
>
> ________________________________
> From: N Keywal <nk...@gmail.com>
> To: dev@hbase.apache.org
> Sent: Saturday, October 22, 2011 2:53 PM
> Subject: Re: Strange performance behavior of SingleValColumnFilter
>
> Hi,
>
> I made a change recently on this. It was to fix a consistency bug rather
> than improve the performances, but on my test the performances were actually
> improved as well. It was for MemStore only. Is the time lost on the memstore
> or in the persisted related part?
>
> Cheers,
>
> N.
>
> On Sat, Oct 22, 2011 at 6:22 AM, lars hofhansl <lh...@yahoo.com> wrote:
>
>> No it was a trunk build. The local tests I did with a build from today.
>> Our test cluster is a 1 or 2 weeks old.
>>
>> It seems it just much cheaper to scan through block that we already have or
>> even scanning into the next block than to reseek.
>>
>>
>>
>> ----- Original Message -----
>> From: Ted Yu <yu...@gmail.com>
>> To: dev@hbase.apache.org; lars hofhansl <lh...@yahoo.com>
>> Cc:
>> Sent: Friday, October 21, 2011 8:22 PM
>> Subject: Re: Strange performance behavior of SingleValColumnFilter
>>
>> Was the following evaluation performed on 0.92 ?
>> Also, I assume you use ROWCOL bloom filter.
>> In TRUNK, Mikhail has put in lazy seek which I think should help
>> performance.
>>
>> Cheers
>>
>> On Fri, Oct 21, 2011 at 7:34 PM, lars hofhansl <lh...@yahoo.com>
>> wrote:
>>
>> > We found that even with many columns, and even when the filter matches
>> the
>> > first column, SKIP is still faster than NEXT_ROW.
>> > So either the reseek is extremely inefficient, or there is something else
>> > at play.
>> >
>> > It might be worthwhile to have StoreScanner upon SEEK_NEXT_ROW try the
>> next
>> > N KVs (maybe N=10 or 20 or even bigger) to see if we
>> > get to the next row, and only if we didn't reach the next row do the
>> > reseek.
>> >
>> > ________________________________
>> > From: lars hofhansl <lh...@yahoo.com>
>> > To: "dev@hbase.apache.org" <de...@hbase.apache.org>; lars hofhansl <
>> > lhofhansl@yahoo.com>
>> > Sent: Friday, October 21, 2011 4:34 PM
>> > Subject: Re: Strange performance behavior of SingleValColumnFilter
>> >
>> > Maybe it even makes sense. When the scan is limited to one column and
>> there
>> > is only one version, SKIP would skip to the next row.
>> > But 10x slower for NEXT_ROW seems extreme.
>> >
>> >
>> >
>> > ________________________________
>> > From: lars hofhansl <lh...@yahoo.com>
>> > To: hbase-dev <de...@hbase.apache.org>
>> > Sent: Friday, October 21, 2011 3:49 PM
>> > Subject: Strange performance behavior of SingleValColumnFilter
>> >
>> > We have been doing some performance testing on HBase filters. One outcome
>> > was HBASE-4626 (which I fixed and committed yesterday night).
>> >
>> > Now we found a rather strange behavior with SingleColumnValueFilter. On
>> our
>> > test cluster it is 10x slower than ValueFilter, even when we restrict the
>> > scan to just the one column we are filtering on and set filterIfMissing
>> to
>> > true.
>> > We are not seeing that with HBase in local mode, which points to some
>> > additional activity on the FS, which in HDFS would be slow compared to a
>> > local FS.
>> >
>> >
>> > Indeed it turns out the problem goes away when we replace all NEXT_ROW
>> with
>> > SKIP in SingleColumnValueFilter.filterKeyValue the performance is *much*
>> > better (on par with ValueFilter).
>> >
>> >
>> > We're using something pretty close to trunk for our tests.
>> > The tables are pretty wide, only one version of each cells (and freshly
>> > major compacted).
>> >
>> >
>> > I do not know this part of the code that well (yet) and was wondering if
>> > somebody could chime in. Maybe this is related to HFileV2?
>> >
>> > I do recall there was something done to optimize reseeks. Generally I
>> would
>> > have expected NEXT_ROW to be a major performance improvement.
>> >
>> > Any ideas, comments, pointers?
>> >
>> > Thanks.
>> >
>> > -- Lars
>> >
>>
>>
>

Re: Strange performance behavior of SingleColumnValueFilter

Posted by lars hofhansl <lh...@yahoo.com>.

Should there be an option to disable data block caching and only allow index block caching?
For some analytical setups that might make sense.
(obviously, the same can be achieved by setting cacheBlocks to false in every Scan object)



----- Original Message -----
From: lars hofhansl <lh...@yahoo.com>
To: "dev@hbase.apache.org" <de...@hbase.apache.org>; lars hofhansl <lh...@yahoo.com>
Cc: 
Sent: Tuesday, October 25, 2011 2:22 PM
Subject: Re: Strange performance behavior of SingleColumnValueFilter

It turns out that from other tests we did we had a stray 


<property>
    <name>hfile.block.cache.size</name>
    <value>0</value>
</property>


in our config. D'oh...

When we removed that, the performance of SCVF was on par with ValueFilter.

Setting cacheBlocks on the Scan object had almost no affect, so this must be related
to the caching of Index Blocks.
NEXT_ROW forces re-reading of Index Blocks it seems, whereas SKIP does not.

So in summary:
When hfile.block.cache.size=0, returning NEXT_ROW from a ScanQueryMatcher can be significantly slower than returning SKIP.

-- Lars


----- Original Message -----
From: lars hofhansl <lh...@yahoo.com>
To: "dev@hbase.apache.org" <de...@hbase.apache.org>
Cc: 
Sent: Saturday, October 22, 2011 5:16 PM
Subject: Re: Strange performance behavior of SingleValColumnFilter

Thanks N.

I do not think the time is lost in the memstore. We're working with fully compacted
tables and do no updates during the read testing.

We'll be spending more time to track this down on Monday.


-- Lars

________________________________
From: N Keywal <nk...@gmail.com>
To: dev@hbase.apache.org
Sent: Saturday, October 22, 2011 2:53 PM
Subject: Re: Strange performance behavior of SingleValColumnFilter

Hi,

I made a change recently on this. It was to fix a consistency bug rather
than improve the performances, but on my test the performances were actually
improved as well. It was for MemStore only. Is the time lost on the memstore
or in the persisted related part?

Cheers,

N.

On Sat, Oct 22, 2011 at 6:22 AM, lars hofhansl <lh...@yahoo.com> wrote:

> No it was a trunk build. The local tests I did with a build from today.
> Our test cluster is a 1 or 2 weeks old.
>
> It seems it just much cheaper to scan through block that we already have or
> even scanning into the next block than to reseek.
>
>
>
> ----- Original Message -----
> From: Ted Yu <yu...@gmail.com>
> To: dev@hbase.apache.org; lars hofhansl <lh...@yahoo.com>
> Cc:
> Sent: Friday, October 21, 2011 8:22 PM
> Subject: Re: Strange performance behavior of SingleValColumnFilter
>
> Was the following evaluation performed on 0.92 ?
> Also, I assume you use ROWCOL bloom filter.
> In TRUNK, Mikhail has put in lazy seek which I think should help
> performance.
>
> Cheers
>
> On Fri, Oct 21, 2011 at 7:34 PM, lars hofhansl <lh...@yahoo.com>
> wrote:
>
> > We found that even with many columns, and even when the filter matches
> the
> > first column, SKIP is still faster than NEXT_ROW.
> > So either the reseek is extremely inefficient, or there is something else
> > at play.
> >
> > It might be worthwhile to have StoreScanner upon SEEK_NEXT_ROW try the
> next
> > N KVs (maybe N=10 or 20 or even bigger) to see if we
> > get to the next row, and only if we didn't reach the next row do the
> > reseek.
> >
> > ________________________________
> > From: lars hofhansl <lh...@yahoo.com>
> > To: "dev@hbase.apache.org" <de...@hbase.apache.org>; lars hofhansl <
> > lhofhansl@yahoo.com>
> > Sent: Friday, October 21, 2011 4:34 PM
> > Subject: Re: Strange performance behavior of SingleValColumnFilter
> >
> > Maybe it even makes sense. When the scan is limited to one column and
> there
> > is only one version, SKIP would skip to the next row.
> > But 10x slower for NEXT_ROW seems extreme.
> >
> >
> >
> > ________________________________
> > From: lars hofhansl <lh...@yahoo.com>
> > To: hbase-dev <de...@hbase.apache.org>
> > Sent: Friday, October 21, 2011 3:49 PM
> > Subject: Strange performance behavior of SingleValColumnFilter
> >
> > We have been doing some performance testing on HBase filters. One outcome
> > was HBASE-4626 (which I fixed and committed yesterday night).
> >
> > Now we found a rather strange behavior with SingleColumnValueFilter. On
> our
> > test cluster it is 10x slower than ValueFilter, even when we restrict the
> > scan to just the one column we are filtering on and set filterIfMissing
> to
> > true.
> > We are not seeing that with HBase in local mode, which points to some
> > additional activity on the FS, which in HDFS would be slow compared to a
> > local FS.
> >
> >
> > Indeed it turns out the problem goes away when we replace all NEXT_ROW
> with
> > SKIP in SingleColumnValueFilter.filterKeyValue the performance is *much*
> > better (on par with ValueFilter).
> >
> >
> > We're using something pretty close to trunk for our tests.
> > The tables are pretty wide, only one version of each cells (and freshly
> > major compacted).
> >
> >
> > I do not know this part of the code that well (yet) and was wondering if
> > somebody could chime in. Maybe this is related to HFileV2?
> >
> > I do recall there was something done to optimize reseeks. Generally I
> would
> > have expected NEXT_ROW to be a major performance improvement.
> >
> > Any ideas, comments, pointers?
> >
> > Thanks.
> >
> > -- Lars
> >
>
>

Re: Strange performance behavior of SingleColumnValueFilter

Posted by lars hofhansl <lh...@yahoo.com>.

It turns out that from other tests we did we had a stray 


<property>
    <name>hfile.block.cache.size</name>
    <value>0</value>
</property>


in our config. D'oh...

When we removed that, the performance of SCVF was on par with ValueFilter.

Setting cacheBlocks on the Scan object had almost no affect, so this must be related
to the caching of Index Blocks.
NEXT_ROW forces re-reading of Index Blocks it seems, whereas SKIP does not.

So in summary:
When hfile.block.cache.size=0, returning NEXT_ROW from a ScanQueryMatcher can be significantly slower than returning SKIP.

-- Lars


----- Original Message -----
From: lars hofhansl <lh...@yahoo.com>
To: "dev@hbase.apache.org" <de...@hbase.apache.org>
Cc: 
Sent: Saturday, October 22, 2011 5:16 PM
Subject: Re: Strange performance behavior of SingleValColumnFilter

Thanks N.

I do not think the time is lost in the memstore. We're working with fully compacted
tables and do no updates during the read testing.

We'll be spending more time to track this down on Monday.


-- Lars

________________________________
From: N Keywal <nk...@gmail.com>
To: dev@hbase.apache.org
Sent: Saturday, October 22, 2011 2:53 PM
Subject: Re: Strange performance behavior of SingleValColumnFilter

Hi,

I made a change recently on this. It was to fix a consistency bug rather
than improve the performances, but on my test the performances were actually
improved as well. It was for MemStore only. Is the time lost on the memstore
or in the persisted related part?

Cheers,

N.

On Sat, Oct 22, 2011 at 6:22 AM, lars hofhansl <lh...@yahoo.com> wrote:

> No it was a trunk build. The local tests I did with a build from today.
> Our test cluster is a 1 or 2 weeks old.
>
> It seems it just much cheaper to scan through block that we already have or
> even scanning into the next block than to reseek.
>
>
>
> ----- Original Message -----
> From: Ted Yu <yu...@gmail.com>
> To: dev@hbase.apache.org; lars hofhansl <lh...@yahoo.com>
> Cc:
> Sent: Friday, October 21, 2011 8:22 PM
> Subject: Re: Strange performance behavior of SingleValColumnFilter
>
> Was the following evaluation performed on 0.92 ?
> Also, I assume you use ROWCOL bloom filter.
> In TRUNK, Mikhail has put in lazy seek which I think should help
> performance.
>
> Cheers
>
> On Fri, Oct 21, 2011 at 7:34 PM, lars hofhansl <lh...@yahoo.com>
> wrote:
>
> > We found that even with many columns, and even when the filter matches
> the
> > first column, SKIP is still faster than NEXT_ROW.
> > So either the reseek is extremely inefficient, or there is something else
> > at play.
> >
> > It might be worthwhile to have StoreScanner upon SEEK_NEXT_ROW try the
> next
> > N KVs (maybe N=10 or 20 or even bigger) to see if we
> > get to the next row, and only if we didn't reach the next row do the
> > reseek.
> >
> > ________________________________
> > From: lars hofhansl <lh...@yahoo.com>
> > To: "dev@hbase.apache.org" <de...@hbase.apache.org>; lars hofhansl <
> > lhofhansl@yahoo.com>
> > Sent: Friday, October 21, 2011 4:34 PM
> > Subject: Re: Strange performance behavior of SingleValColumnFilter
> >
> > Maybe it even makes sense. When the scan is limited to one column and
> there
> > is only one version, SKIP would skip to the next row.
> > But 10x slower for NEXT_ROW seems extreme.
> >
> >
> >
> > ________________________________
> > From: lars hofhansl <lh...@yahoo.com>
> > To: hbase-dev <de...@hbase.apache.org>
> > Sent: Friday, October 21, 2011 3:49 PM
> > Subject: Strange performance behavior of SingleValColumnFilter
> >
> > We have been doing some performance testing on HBase filters. One outcome
> > was HBASE-4626 (which I fixed and committed yesterday night).
> >
> > Now we found a rather strange behavior with SingleColumnValueFilter. On
> our
> > test cluster it is 10x slower than ValueFilter, even when we restrict the
> > scan to just the one column we are filtering on and set filterIfMissing
> to
> > true.
> > We are not seeing that with HBase in local mode, which points to some
> > additional activity on the FS, which in HDFS would be slow compared to a
> > local FS.
> >
> >
> > Indeed it turns out the problem goes away when we replace all NEXT_ROW
> with
> > SKIP in SingleColumnValueFilter.filterKeyValue the performance is *much*
> > better (on par with ValueFilter).
> >
> >
> > We're using something pretty close to trunk for our tests.
> > The tables are pretty wide, only one version of each cells (and freshly
> > major compacted).
> >
> >
> > I do not know this part of the code that well (yet) and was wondering if
> > somebody could chime in. Maybe this is related to HFileV2?
> >
> > I do recall there was something done to optimize reseeks. Generally I
> would
> > have expected NEXT_ROW to be a major performance improvement.
> >
> > Any ideas, comments, pointers?
> >
> > Thanks.
> >
> > -- Lars
> >
>
>

Re: Strange performance behavior of SingleValColumnFilter

Posted by lars hofhansl <lh...@yahoo.com>.

Thanks N.

I do not think the time is lost in the memstore. We're working with fully compacted
tables and do no updates during the read testing.

We'll be spending more time to track this down on Monday.


-- Lars

________________________________
From: N Keywal <nk...@gmail.com>
To: dev@hbase.apache.org
Sent: Saturday, October 22, 2011 2:53 PM
Subject: Re: Strange performance behavior of SingleValColumnFilter

Hi,

I made a change recently on this. It was to fix a consistency bug rather
than improve the performances, but on my test the performances were actually
improved as well. It was for MemStore only. Is the time lost on the memstore
or in the persisted related part?

Cheers,

N.

On Sat, Oct 22, 2011 at 6:22 AM, lars hofhansl <lh...@yahoo.com> wrote:

> No it was a trunk build. The local tests I did with a build from today.
> Our test cluster is a 1 or 2 weeks old.
>
> It seems it just much cheaper to scan through block that we already have or
> even scanning into the next block than to reseek.
>
>
>
> ----- Original Message -----
> From: Ted Yu <yu...@gmail.com>
> To: dev@hbase.apache.org; lars hofhansl <lh...@yahoo.com>
> Cc:
> Sent: Friday, October 21, 2011 8:22 PM
> Subject: Re: Strange performance behavior of SingleValColumnFilter
>
> Was the following evaluation performed on 0.92 ?
> Also, I assume you use ROWCOL bloom filter.
> In TRUNK, Mikhail has put in lazy seek which I think should help
> performance.
>
> Cheers
>
> On Fri, Oct 21, 2011 at 7:34 PM, lars hofhansl <lh...@yahoo.com>
> wrote:
>
> > We found that even with many columns, and even when the filter matches
> the
> > first column, SKIP is still faster than NEXT_ROW.
> > So either the reseek is extremely inefficient, or there is something else
> > at play.
> >
> > It might be worthwhile to have StoreScanner upon SEEK_NEXT_ROW try the
> next
> > N KVs (maybe N=10 or 20 or even bigger) to see if we
> > get to the next row, and only if we didn't reach the next row do the
> > reseek.
> >
> > ________________________________
> > From: lars hofhansl <lh...@yahoo.com>
> > To: "dev@hbase.apache.org" <de...@hbase.apache.org>; lars hofhansl <
> > lhofhansl@yahoo.com>
> > Sent: Friday, October 21, 2011 4:34 PM
> > Subject: Re: Strange performance behavior of SingleValColumnFilter
> >
> > Maybe it even makes sense. When the scan is limited to one column and
> there
> > is only one version, SKIP would skip to the next row.
> > But 10x slower for NEXT_ROW seems extreme.
> >
> >
> >
> > ________________________________
> > From: lars hofhansl <lh...@yahoo.com>
> > To: hbase-dev <de...@hbase.apache.org>
> > Sent: Friday, October 21, 2011 3:49 PM
> > Subject: Strange performance behavior of SingleValColumnFilter
> >
> > We have been doing some performance testing on HBase filters. One outcome
> > was HBASE-4626 (which I fixed and committed yesterday night).
> >
> > Now we found a rather strange behavior with SingleColumnValueFilter. On
> our
> > test cluster it is 10x slower than ValueFilter, even when we restrict the
> > scan to just the one column we are filtering on and set filterIfMissing
> to
> > true.
> > We are not seeing that with HBase in local mode, which points to some
> > additional activity on the FS, which in HDFS would be slow compared to a
> > local FS.
> >
> >
> > Indeed it turns out the problem goes away when we replace all NEXT_ROW
> with
> > SKIP in SingleColumnValueFilter.filterKeyValue the performance is *much*
> > better (on par with ValueFilter).
> >
> >
> > We're using something pretty close to trunk for our tests.
> > The tables are pretty wide, only one version of each cells (and freshly
> > major compacted).
> >
> >
> > I do not know this part of the code that well (yet) and was wondering if
> > somebody could chime in. Maybe this is related to HFileV2?
> >
> > I do recall there was something done to optimize reseeks. Generally I
> would
> > have expected NEXT_ROW to be a major performance improvement.
> >
> > Any ideas, comments, pointers?
> >
> > Thanks.
> >
> > -- Lars
> >
>
>

Re: Strange performance behavior of SingleValColumnFilter

Posted by N Keywal <nk...@gmail.com>.

Hi,

I made a change recently on this. It was to fix a consistency bug rather
than improve the performances, but on my test the performances were actually
improved as well. It was for MemStore only. Is the time lost on the memstore
or in the persisted related part?

Cheers,

N.

On Sat, Oct 22, 2011 at 6:22 AM, lars hofhansl <lh...@yahoo.com> wrote:

> No it was a trunk build. The local tests I did with a build from today.
> Our test cluster is a 1 or 2 weeks old.
>
> It seems it just much cheaper to scan through block that we already have or
> even scanning into the next block than to reseek.
>
>
>
> ----- Original Message -----
> From: Ted Yu <yu...@gmail.com>
> To: dev@hbase.apache.org; lars hofhansl <lh...@yahoo.com>
> Cc:
> Sent: Friday, October 21, 2011 8:22 PM
> Subject: Re: Strange performance behavior of SingleValColumnFilter
>
> Was the following evaluation performed on 0.92 ?
> Also, I assume you use ROWCOL bloom filter.
> In TRUNK, Mikhail has put in lazy seek which I think should help
> performance.
>
> Cheers
>
> On Fri, Oct 21, 2011 at 7:34 PM, lars hofhansl <lh...@yahoo.com>
> wrote:
>
> > We found that even with many columns, and even when the filter matches
> the
> > first column, SKIP is still faster than NEXT_ROW.
> > So either the reseek is extremely inefficient, or there is something else
> > at play.
> >
> > It might be worthwhile to have StoreScanner upon SEEK_NEXT_ROW try the
> next
> > N KVs (maybe N=10 or 20 or even bigger) to see if we
> > get to the next row, and only if we didn't reach the next row do the
> > reseek.
> >
> > ________________________________
> > From: lars hofhansl <lh...@yahoo.com>
> > To: "dev@hbase.apache.org" <de...@hbase.apache.org>; lars hofhansl <
> > lhofhansl@yahoo.com>
> > Sent: Friday, October 21, 2011 4:34 PM
> > Subject: Re: Strange performance behavior of SingleValColumnFilter
> >
> > Maybe it even makes sense. When the scan is limited to one column and
> there
> > is only one version, SKIP would skip to the next row.
> > But 10x slower for NEXT_ROW seems extreme.
> >
> >
> >
> > ________________________________
> > From: lars hofhansl <lh...@yahoo.com>
> > To: hbase-dev <de...@hbase.apache.org>
> > Sent: Friday, October 21, 2011 3:49 PM
> > Subject: Strange performance behavior of SingleValColumnFilter
> >
> > We have been doing some performance testing on HBase filters. One outcome
> > was HBASE-4626 (which I fixed and committed yesterday night).
> >
> > Now we found a rather strange behavior with SingleColumnValueFilter. On
> our
> > test cluster it is 10x slower than ValueFilter, even when we restrict the
> > scan to just the one column we are filtering on and set filterIfMissing
> to
> > true.
> > We are not seeing that with HBase in local mode, which points to some
> > additional activity on the FS, which in HDFS would be slow compared to a
> > local FS.
> >
> >
> > Indeed it turns out the problem goes away when we replace all NEXT_ROW
> with
> > SKIP in SingleColumnValueFilter.filterKeyValue the performance is *much*
> > better (on par with ValueFilter).
> >
> >
> > We're using something pretty close to trunk for our tests.
> > The tables are pretty wide, only one version of each cells (and freshly
> > major compacted).
> >
> >
> > I do not know this part of the code that well (yet) and was wondering if
> > somebody could chime in. Maybe this is related to HFileV2?
> >
> > I do recall there was something done to optimize reseeks. Generally I
> would
> > have expected NEXT_ROW to be a major performance improvement.
> >
> > Any ideas, comments, pointers?
> >
> > Thanks.
> >
> > -- Lars
> >
>
>

Re: Strange performance behavior of SingleValColumnFilter

Posted by lars hofhansl <lh...@yahoo.com>.

No it was a trunk build. The local tests I did with a build from today.
Our test cluster is a 1 or 2 weeks old.

It seems it just much cheaper to scan through block that we already have or even scanning into the next block than to reseek.



----- Original Message -----
From: Ted Yu <yu...@gmail.com>
To: dev@hbase.apache.org; lars hofhansl <lh...@yahoo.com>
Cc: 
Sent: Friday, October 21, 2011 8:22 PM
Subject: Re: Strange performance behavior of SingleValColumnFilter

Was the following evaluation performed on 0.92 ?
Also, I assume you use ROWCOL bloom filter.
In TRUNK, Mikhail has put in lazy seek which I think should help
performance.

Cheers

On Fri, Oct 21, 2011 at 7:34 PM, lars hofhansl <lh...@yahoo.com> wrote:

> We found that even with many columns, and even when the filter matches the
> first column, SKIP is still faster than NEXT_ROW.
> So either the reseek is extremely inefficient, or there is something else
> at play.
>
> It might be worthwhile to have StoreScanner upon SEEK_NEXT_ROW try the next
> N KVs (maybe N=10 or 20 or even bigger) to see if we
> get to the next row, and only if we didn't reach the next row do the
> reseek.
>
> ________________________________
> From: lars hofhansl <lh...@yahoo.com>
> To: "dev@hbase.apache.org" <de...@hbase.apache.org>; lars hofhansl <
> lhofhansl@yahoo.com>
> Sent: Friday, October 21, 2011 4:34 PM
> Subject: Re: Strange performance behavior of SingleValColumnFilter
>
> Maybe it even makes sense. When the scan is limited to one column and there
> is only one version, SKIP would skip to the next row.
> But 10x slower for NEXT_ROW seems extreme.
>
>
>
> ________________________________
> From: lars hofhansl <lh...@yahoo.com>
> To: hbase-dev <de...@hbase.apache.org>
> Sent: Friday, October 21, 2011 3:49 PM
> Subject: Strange performance behavior of SingleValColumnFilter
>
> We have been doing some performance testing on HBase filters. One outcome
> was HBASE-4626 (which I fixed and committed yesterday night).
>
> Now we found a rather strange behavior with SingleColumnValueFilter. On our
> test cluster it is 10x slower than ValueFilter, even when we restrict the
> scan to just the one column we are filtering on and set filterIfMissing to
> true.
> We are not seeing that with HBase in local mode, which points to some
> additional activity on the FS, which in HDFS would be slow compared to a
> local FS.
>
>
> Indeed it turns out the problem goes away when we replace all NEXT_ROW with
> SKIP in SingleColumnValueFilter.filterKeyValue the performance is *much*
> better (on par with ValueFilter).
>
>
> We're using something pretty close to trunk for our tests.
> The tables are pretty wide, only one version of each cells (and freshly
> major compacted).
>
>
> I do not know this part of the code that well (yet) and was wondering if
> somebody could chime in. Maybe this is related to HFileV2?
>
> I do recall there was something done to optimize reseeks. Generally I would
> have expected NEXT_ROW to be a major performance improvement.
>
> Any ideas, comments, pointers?
>
> Thanks.
>
> -- Lars
>

Re: Strange performance behavior of SingleValColumnFilter

Posted by Ted Yu <yu...@gmail.com>.

Was the following evaluation performed on 0.92 ?
Also, I assume you use ROWCOL bloom filter.
In TRUNK, Mikhail has put in lazy seek which I think should help
performance.

Cheers

On Fri, Oct 21, 2011 at 7:34 PM, lars hofhansl <lh...@yahoo.com> wrote:

> We found that even with many columns, and even when the filter matches the
> first column, SKIP is still faster than NEXT_ROW.
> So either the reseek is extremely inefficient, or there is something else
> at play.
>
> It might be worthwhile to have StoreScanner upon SEEK_NEXT_ROW try the next
> N KVs (maybe N=10 or 20 or even bigger) to see if we
> get to the next row, and only if we didn't reach the next row do the
> reseek.
>
> ________________________________
> From: lars hofhansl <lh...@yahoo.com>
> To: "dev@hbase.apache.org" <de...@hbase.apache.org>; lars hofhansl <
> lhofhansl@yahoo.com>
> Sent: Friday, October 21, 2011 4:34 PM
> Subject: Re: Strange performance behavior of SingleValColumnFilter
>
> Maybe it even makes sense. When the scan is limited to one column and there
> is only one version, SKIP would skip to the next row.
> But 10x slower for NEXT_ROW seems extreme.
>
>
>
> ________________________________
> From: lars hofhansl <lh...@yahoo.com>
> To: hbase-dev <de...@hbase.apache.org>
> Sent: Friday, October 21, 2011 3:49 PM
> Subject: Strange performance behavior of SingleValColumnFilter
>
> We have been doing some performance testing on HBase filters. One outcome
> was HBASE-4626 (which I fixed and committed yesterday night).
>
> Now we found a rather strange behavior with SingleColumnValueFilter. On our
> test cluster it is 10x slower than ValueFilter, even when we restrict the
> scan to just the one column we are filtering on and set filterIfMissing to
> true.
> We are not seeing that with HBase in local mode, which points to some
> additional activity on the FS, which in HDFS would be slow compared to a
> local FS.
>
>
> Indeed it turns out the problem goes away when we replace all NEXT_ROW with
> SKIP in SingleColumnValueFilter.filterKeyValue the performance is *much*
> better (on par with ValueFilter).
>
>
> We're using something pretty close to trunk for our tests.
> The tables are pretty wide, only one version of each cells (and freshly
> major compacted).
>
>
> I do not know this part of the code that well (yet) and was wondering if
> somebody could chime in. Maybe this is related to HFileV2?
>
> I do recall there was something done to optimize reseeks. Generally I would
> have expected NEXT_ROW to be a major performance improvement.
>
> Any ideas, comments, pointers?
>
> Thanks.
>
> -- Lars
>

Re: Strange performance behavior of SingleValColumnFilter

Posted by lars hofhansl <lh...@yahoo.com>.

We found that even with many columns, and even when the filter matches the first column, SKIP is still faster than NEXT_ROW.
So either the reseek is extremely inefficient, or there is something else at play.

It might be worthwhile to have StoreScanner upon SEEK_NEXT_ROW try the next N KVs (maybe N=10 or 20 or even bigger) to see if we
get to the next row, and only if we didn't reach the next row do the reseek.

________________________________
From: lars hofhansl <lh...@yahoo.com>
To: "dev@hbase.apache.org" <de...@hbase.apache.org>; lars hofhansl <lh...@yahoo.com>
Sent: Friday, October 21, 2011 4:34 PM
Subject: Re: Strange performance behavior of SingleValColumnFilter

Maybe it even makes sense. When the scan is limited to one column and there is only one version, SKIP would skip to the next row.
But 10x slower for NEXT_ROW seems extreme.

________________________________
From: lars hofhansl <lh...@yahoo.com>
To: hbase-dev <de...@hbase.apache.org>
Sent: Friday, October 21, 2011 3:49 PM
Subject: Strange performance behavior of SingleValColumnFilter

We have been doing some performance testing on HBase filters. One outcome was HBASE-4626 (which I fixed and committed yesterday night).

Now we found a rather strange behavior with SingleColumnValueFilter. On our test cluster it is 10x slower than ValueFilter, even when we restrict the scan to just the one column we are filtering on and set filterIfMissing to true.
We are not seeing that with HBase in local mode, which points to some additional activity on the FS, which in HDFS would be slow compared to a local FS.

Indeed it turns out the problem goes away when we replace all NEXT_ROW with SKIP in SingleColumnValueFilter.filterKeyValue the performance is *much* better (on par with ValueFilter).

We're using something pretty close to trunk for our tests.
The tables are pretty wide, only one version of each cells (and freshly major compacted).

I do not know this part of the code that well (yet) and was wondering if somebody could chime in. Maybe this is related to HFileV2?

I do recall there was something done to optimize reseeks. Generally I would have expected NEXT_ROW to be a major performance improvement.

Any ideas, comments, pointers?

Thanks.

-- Lars

Re: Strange performance behavior of SingleValColumnFilter

Posted by lars hofhansl <lh...@yahoo.com>.

Maybe it even makes sense. When the scan is limited to one column and there is only one version, SKIP would skip to the next row.
But 10x slower for NEXT_ROW seems extreme.

________________________________
From: lars hofhansl <lh...@yahoo.com>
To: hbase-dev <de...@hbase.apache.org>
Sent: Friday, October 21, 2011 3:49 PM
Subject: Strange performance behavior of SingleValColumnFilter

We have been doing some performance testing on HBase filters. One outcome was HBASE-4626 (which I fixed and committed yesterday night).

Now we found a rather strange behavior with SingleColumnValueFilter. On our test cluster it is 10x slower than ValueFilter, even when we restrict the scan to just the one column we are filtering on and set filterIfMissing to true.
We are not seeing that with HBase in local mode, which points to some additional activity on the FS, which in HDFS would be slow compared to a local FS.

Indeed it turns out the problem goes away when we replace all NEXT_ROW with SKIP in SingleColumnValueFilter.filterKeyValue the performance is *much* better (on par with ValueFilter).

We're using something pretty close to trunk for our tests.
The tables are pretty wide, only one version of each cells (and freshly major compacted).

I do not know this part of the code that well (yet) and was wondering if somebody could chime in. Maybe this is related to HFileV2?

I do recall there was something done to optimize reseeks. Generally I would have expected NEXT_ROW to be a major performance improvement.

Any ideas, comments, pointers?

Thanks.

-- Lars