You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by Vladimir Rodionov <vr...@carrieriq.com> on 2013/10/14 20:18:16 UTC

Scanner with explicit columns list is very slow

Its 0.94.6 and there is chance that the issue has been fixed already

Simple table: one column + one qualifier

Two type of scans:

1. Scan.addFamily(CF)

2. Scan.addColumn(CF, CQ)

Both run on block cache (all data in memory)

Tested on StoreScanner directly.

1. 4.2M KVs per sec per one thread
2. 1.5M KVs per second per one thread.

The difference? First scanner's ScanQueryMatcher returns INCLUDE, DONE, second - INCLUDE_NEXT_ROW, DONE
The cost of Row's reseek is huge.

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com


Confidentiality Notice:  The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited.  If you have received this message in error, please immediately notify the sender and/or Notifications@carrieriq.com and delete or destroy any copy of this message and its attachments.

Re: Scanner with explicit columns list is very slow

Posted by Vladimir Rodionov <vl...@gmail.com>.

https://issues.apache.org/jira/browse/HBASE-9769


On Tue, Oct 15, 2013 at 11:19 AM, Vladimir Rodionov
<vl...@gmail.com>wrote:

> I Did some tweaking to ExplicitColumnsFilter (former
> ExplicitScanReplacementFilter) and now  getting 1.45M rows per sec ( > 350%
> speed up compared to default implementation). The raw scan (no columns
> specified) runs at 1.6M rows per sec.  Its only 10% performance hit.
> Phoenix team should be happy today. I am opening JIRA.
>

Re: Scanner with explicit columns list is very slow

Posted by Vladimir Rodionov <vl...@gmail.com>.

I Did some tweaking to ExplicitColumnsFilter (former
ExplicitScanReplacementFilter) and now  getting 1.45M rows per sec ( > 350%
speed up compared to default implementation). The raw scan (no columns
specified) runs at 1.6M rows per sec.  Its only 10% performance hit.
Phoenix team should be happy today. I am opening JIRA.

Re: Scanner with explicit columns list is very slow

Posted by Vladimir Rodionov <vl...@gmail.com>.

Yes, I load data into HRegion (with CACHE_ON_WRITE) than call flashcache()
(no data in memstore).

This is what I found: the default implementation of  ExplicitColumnMatcher
is (possibly) tuned to very large rows, I would say - very large. We need a
hint for scan which  tells StoreScanner which strategy to use :

1. ExplicitColumnMatcher with reseeks (what we have currently) for very
large rows
Or for small/medium rows
2. Remove explicit columns/families  from a Scan and replace them with
additional filter which actually keeps columnFamilyMap from scan and
verifies every KV  matches with this map.

I have created such a filter (ExplicitScanReplacementFilter) and verified
that it works much better than case 1. for small rows. For 1 CF + 5 CQs and
Scan with 2 CQs I have:

400K rows per sec with default
1.25M with ExplicitScanReplacementFilter

ExplicitScanReplacementFilter I will optimize even more and will probably
get tomorrow 1.4-1.5M rows per sec.
We need a JIRA and I will open one tomorrow.

-Vladimir


On Mon, Oct 14, 2013 at 9:38 PM, lars hofhansl <la...@apache.org> wrote:

> Interesting. Thanks for doing the testing/profiling Vladimir!
>
>
> Generally reseeks are better if they can skip many KVs.
>
> For example if you have many versions of the same row/col,
> INCLUDE_NEXT_COL will be better than issuing many INCLUDEs, same with
> INCLUDE_NEXT_ROW if there are many columns.
>
> Since the number of columns/versions is not known at scan time (and can in
> fact vary between rows) it is hard to always do the right thing. It also
> depends on how large the KVs are average. So replacing INCLUDE_NEXT_XXX
> with INCLUDE is not always the right idea.
>
>
> Thinking aloud... We could take the VERSIONS setting of the column family
> into account as a guideline for the expected number of versions (but
> there's no guarantee about how many version we'll actually have until we
> had a compaction), and replace INCLUDE_NEXT_COL with INCLUDE if VERSIONS is
> small (maybe < 10 or so). Maybe that'd be worth a jira...
>
>
> There are some fixes in 0.94.12 (HBASE-8930, avoid a superfluous reseek in
> some cases), and HBASE-9732 might help in 0.94.13 (avoid memory fences on
> an volatile on each seek/reseek).
>
> It also would be nice to figure out why reseek is so much more expensive.
> If the KV we reseek to is on the same block it should just scan forward,
> otherwise it'll look in the appropriate block. It probably is the creation
> of the fake KV we want to seek to (like firstOnRow, lastOnRow, etc), which
> case there's not much we can.
>
>
> Lastly, I've not spend much time profiling the ExplicitColumnMatcher, yet,
> looks like I should start doing that.
>
>
> So in your case everything is in the blockcache, no data in the memstore?
>
> -- Lars
>
>
>
> ________________________________
>  From: Vladimir Rodionov <vl...@gmail.com>
> To: "dev@hbase.apache.org" <de...@hbase.apache.org>
> Sent: Monday, October 14, 2013 2:49 PM
> Subject: Re: Scanner with explicit columns list is very slow
>
>
> One fast optimization:
>
> There is no need to call reseek on INCLUDE_NEXT_COL - this is going to be
> the same row in the same KeyValueScanner (currently on top of
> KeyValueHeap).
>
>
>
>
>
> On Mon, Oct 14, 2013 at 2:46 PM, Vladimir Rodionov
> <vl...@gmail.com>wrote:
>
> > I profiled the last test case (5 columns total and 2 in a scan).
> >
> > 80% of StoreScanner.next() execution time are in :
> >
> > StoreScanner.reseek() - 71%
> > ScanQueryMathcer.getKeyForNextColumn() - 6%
> > ScanQueryMathcer.getKeyForNextRow() - 2%
> >
> > Should I open JIRA?
> >
> >
> > On Mon, Oct 14, 2013 at 2:03 PM, Vladimir Rodionov <
> vladrodionov@gmail.com
> > > wrote:
> >
> >> I modified tests:
> >>
> >> Now I created table with one CF and 5 columns: CQ1,..,CQ5
> >>
> >> 1. Scan.addColumn(CF, CQ1);
> >>     Scan.addColumn(CF, CQ3);
> >>
> >> 2. Scan.addFamily(CF);
> >>
> >> Scan performance from block cache:
> >>
> >> 1.  400K rows per sec
> >> 2.  1.6M rows per sec
> >>
> >> The explicit columns scan performance  is even worse in this case. It is
> >> much faster to scan the WHOLE rows and filter columns later in a Filter,
> >> than specify columns directly in a Scan.
> >>
> >> Definitely needs to be explained/investigated.
> >>
> >>
> >> On Mon, Oct 14, 2013 at 11:18 AM, Vladimir Rodionov <
> >> vrodionov@carrieriq.com> wrote:
> >>
> >>> Its 0.94.6 and there is chance that the issue has been fixed already
> >>>
> >>> Simple table: one column + one qualifier
> >>>
> >>> Two type of scans:
> >>>
> >>> 1. Scan.addFamily(CF)
> >>>
> >>> 2. Scan.addColumn(CF, CQ)
> >>>
> >>> Both run on block cache (all data in memory)
> >>>
> >>> Tested on StoreScanner directly.
> >>>
> >>> 1. 4.2M KVs per sec per one thread
> >>> 2. 1.5M KVs per second per one thread.
> >>>
> >>> The difference? First scanner's ScanQueryMatcher returns INCLUDE, DONE,
> >>> second - INCLUDE_NEXT_ROW, DONE
> >>> The cost of Row's reseek is huge.
> >>>
> >>> Best regards,
> >>> Vladimir Rodionov
> >>> Principal Platform Engineer
> >>> Carrier IQ, www.carrieriq.com
> >>> e-mail: vrodionov@carrieriq.com
> >>>
> >>>
> >>> Confidentiality Notice:  The information contained in this message,
> >>> including any attachments hereto, may be confidential and is intended
> to be
> >>> read only by the individual or entity to whom this message is
> addressed. If
> >>> the reader of this message is not the intended recipient or an agent or
> >>> designee of the intended recipient, please note that any review, use,
> >>> disclosure or distribution of this message or its attachments, in any
> form,
> >>> is strictly prohibited.  If you have received this message in error,
> please
> >>> immediately notify the sender and/or Notifications@carrieriq.com and
> >>> delete or destroy any copy of this message and its attachments.
> >>>
> >>
> >>
> >
>

Re: Scanner with explicit columns list is very slow

Posted by lars hofhansl <la...@apache.org>.

Interesting. Thanks for doing the testing/profiling Vladimir!

Generally reseeks are better if they can skip many KVs.

For example if you have many versions of the same row/col, INCLUDE_NEXT_COL will be better than issuing many INCLUDEs, same with INCLUDE_NEXT_ROW if there are many columns.

Since the number of columns/versions is not known at scan time (and can in fact vary between rows) it is hard to always do the right thing. It also depends on how large the KVs are average. So replacing INCLUDE_NEXT_XXX with INCLUDE is not always the right idea.

Thinking aloud... We could take the VERSIONS setting of the column family into account as a guideline for the expected number of versions (but there's no guarantee about how many version we'll actually have until we had a compaction), and replace INCLUDE_NEXT_COL with INCLUDE if VERSIONS is small (maybe < 10 or so). Maybe that'd be worth a jira...

There are some fixes in 0.94.12 (HBASE-8930, avoid a superfluous reseek in some cases), and HBASE-9732 might help in 0.94.13 (avoid memory fences on an volatile on each seek/reseek).

It also would be nice to figure out why reseek is so much more expensive. If the KV we reseek to is on the same block it should just scan forward, otherwise it'll look in the appropriate block. It probably is the creation of the fake KV we want to seek to (like firstOnRow, lastOnRow, etc), which case there's not much we can.

Lastly, I've not spend much time profiling the ExplicitColumnMatcher, yet, looks like I should start doing that.

So in your case everything is in the blockcache, no data in the memstore?

-- Lars

________________________________
 From: Vladimir Rodionov <vl...@gmail.com>
To: "dev@hbase.apache.org" <de...@hbase.apache.org> 
Sent: Monday, October 14, 2013 2:49 PM
Subject: Re: Scanner with explicit columns list is very slow

One fast optimization:

There is no need to call reseek on INCLUDE_NEXT_COL - this is going to be
the same row in the same KeyValueScanner (currently on top of KeyValueHeap).

On Mon, Oct 14, 2013 at 2:46 PM, Vladimir Rodionov
<vl...@gmail.com>wrote:

> I profiled the last test case (5 columns total and 2 in a scan).
>
> 80% of StoreScanner.next() execution time are in :
>
> StoreScanner.reseek() - 71%
> ScanQueryMathcer.getKeyForNextColumn() - 6%
> ScanQueryMathcer.getKeyForNextRow() - 2%
>
> Should I open JIRA?
>
>
> On Mon, Oct 14, 2013 at 2:03 PM, Vladimir Rodionov <vladrodionov@gmail.com
> > wrote:
>
>> I modified tests:
>>
>> Now I created table with one CF and 5 columns: CQ1,..,CQ5
>>
>> 1. Scan.addColumn(CF, CQ1);
>>     Scan.addColumn(CF, CQ3);
>>
>> 2. Scan.addFamily(CF);
>>
>> Scan performance from block cache:
>>
>> 1.  400K rows per sec
>> 2.  1.6M rows per sec
>>
>> The explicit columns scan performance  is even worse in this case. It is
>> much faster to scan the WHOLE rows and filter columns later in a Filter,
>> than specify columns directly in a Scan.
>>
>> Definitely needs to be explained/investigated.
>>
>>
>> On Mon, Oct 14, 2013 at 11:18 AM, Vladimir Rodionov <
>> vrodionov@carrieriq.com> wrote:
>>
>>> Its 0.94.6 and there is chance that the issue has been fixed already
>>>
>>> Simple table: one column + one qualifier
>>>
>>> Two type of scans:
>>>
>>> 1. Scan.addFamily(CF)
>>>
>>> 2. Scan.addColumn(CF, CQ)
>>>
>>> Both run on block cache (all data in memory)
>>>
>>> Tested on StoreScanner directly.
>>>
>>> 1. 4.2M KVs per sec per one thread
>>> 2. 1.5M KVs per second per one thread.
>>>
>>> The difference? First scanner's ScanQueryMatcher returns INCLUDE, DONE,
>>> second - INCLUDE_NEXT_ROW, DONE
>>> The cost of Row's reseek is huge.
>>>
>>> Best regards,
>>> Vladimir Rodionov
>>> Principal Platform Engineer
>>> Carrier IQ, www.carrieriq.com
>>> e-mail: vrodionov@carrieriq.com
>>>
>>>
>>> Confidentiality Notice:  The information contained in this message,
>>> including any attachments hereto, may be confidential and is intended to be
>>> read only by the individual or entity to whom this message is addressed. If
>>> the reader of this message is not the intended recipient or an agent or
>>> designee of the intended recipient, please note that any review, use,
>>> disclosure or distribution of this message or its attachments, in any form,
>>> is strictly prohibited.  If you have received this message in error, please
>>> immediately notify the sender and/or Notifications@carrieriq.com and
>>> delete or destroy any copy of this message and its attachments.
>>>
>>
>>
>

Re: Scanner with explicit columns list is very slow

Posted by Vladimir Rodionov <vl...@gmail.com>.

One fast optimization:

There is no need to call reseek on INCLUDE_NEXT_COL - this is going to be
the same row in the same KeyValueScanner (currently on top of KeyValueHeap).




On Mon, Oct 14, 2013 at 2:46 PM, Vladimir Rodionov
<vl...@gmail.com>wrote:

> I profiled the last test case (5 columns total and 2 in a scan).
>
> 80% of StoreScanner.next() execution time are in :
>
> StoreScanner.reseek() - 71%
> ScanQueryMathcer.getKeyForNextColumn() - 6%
> ScanQueryMathcer.getKeyForNextRow() - 2%
>
> Should I open JIRA?
>
>
> On Mon, Oct 14, 2013 at 2:03 PM, Vladimir Rodionov <vladrodionov@gmail.com
> > wrote:
>
>> I modified tests:
>>
>> Now I created table with one CF and 5 columns: CQ1,..,CQ5
>>
>> 1. Scan.addColumn(CF, CQ1);
>>     Scan.addColumn(CF, CQ3);
>>
>> 2. Scan.addFamily(CF);
>>
>> Scan performance from block cache:
>>
>> 1.  400K rows per sec
>> 2.  1.6M rows per sec
>>
>> The explicit columns scan performance  is even worse in this case. It is
>> much faster to scan the WHOLE rows and filter columns later in a Filter,
>> than specify columns directly in a Scan.
>>
>> Definitely needs to be explained/investigated.
>>
>>
>> On Mon, Oct 14, 2013 at 11:18 AM, Vladimir Rodionov <
>> vrodionov@carrieriq.com> wrote:
>>
>>> Its 0.94.6 and there is chance that the issue has been fixed already
>>>
>>> Simple table: one column + one qualifier
>>>
>>> Two type of scans:
>>>
>>> 1. Scan.addFamily(CF)
>>>
>>> 2. Scan.addColumn(CF, CQ)
>>>
>>> Both run on block cache (all data in memory)
>>>
>>> Tested on StoreScanner directly.
>>>
>>> 1. 4.2M KVs per sec per one thread
>>> 2. 1.5M KVs per second per one thread.
>>>
>>> The difference? First scanner's ScanQueryMatcher returns INCLUDE, DONE,
>>> second - INCLUDE_NEXT_ROW, DONE
>>> The cost of Row's reseek is huge.
>>>
>>> Best regards,
>>> Vladimir Rodionov
>>> Principal Platform Engineer
>>> Carrier IQ, www.carrieriq.com
>>> e-mail: vrodionov@carrieriq.com
>>>
>>>
>>> Confidentiality Notice:  The information contained in this message,
>>> including any attachments hereto, may be confidential and is intended to be
>>> read only by the individual or entity to whom this message is addressed. If
>>> the reader of this message is not the intended recipient or an agent or
>>> designee of the intended recipient, please note that any review, use,
>>> disclosure or distribution of this message or its attachments, in any form,
>>> is strictly prohibited.  If you have received this message in error, please
>>> immediately notify the sender and/or Notifications@carrieriq.com and
>>> delete or destroy any copy of this message and its attachments.
>>>
>>
>>
>

Re: Scanner with explicit columns list is very slow

Posted by Vladimir Rodionov <vl...@gmail.com>.

I profiled the last test case (5 columns total and 2 in a scan).

80% of StoreScanner.next() execution time are in :

StoreScanner.reseek() - 71%
ScanQueryMathcer.getKeyForNextColumn() - 6%
ScanQueryMathcer.getKeyForNextRow() - 2%

Should I open JIRA?


On Mon, Oct 14, 2013 at 2:03 PM, Vladimir Rodionov
<vl...@gmail.com>wrote:

> I modified tests:
>
> Now I created table with one CF and 5 columns: CQ1,..,CQ5
>
> 1. Scan.addColumn(CF, CQ1);
>     Scan.addColumn(CF, CQ3);
>
> 2. Scan.addFamily(CF);
>
> Scan performance from block cache:
>
> 1.  400K rows per sec
> 2.  1.6M rows per sec
>
> The explicit columns scan performance  is even worse in this case. It is
> much faster to scan the WHOLE rows and filter columns later in a Filter,
> than specify columns directly in a Scan.
>
> Definitely needs to be explained/investigated.
>
>
> On Mon, Oct 14, 2013 at 11:18 AM, Vladimir Rodionov <
> vrodionov@carrieriq.com> wrote:
>
>> Its 0.94.6 and there is chance that the issue has been fixed already
>>
>> Simple table: one column + one qualifier
>>
>> Two type of scans:
>>
>> 1. Scan.addFamily(CF)
>>
>> 2. Scan.addColumn(CF, CQ)
>>
>> Both run on block cache (all data in memory)
>>
>> Tested on StoreScanner directly.
>>
>> 1. 4.2M KVs per sec per one thread
>> 2. 1.5M KVs per second per one thread.
>>
>> The difference? First scanner's ScanQueryMatcher returns INCLUDE, DONE,
>> second - INCLUDE_NEXT_ROW, DONE
>> The cost of Row's reseek is huge.
>>
>> Best regards,
>> Vladimir Rodionov
>> Principal Platform Engineer
>> Carrier IQ, www.carrieriq.com
>> e-mail: vrodionov@carrieriq.com
>>
>>
>> Confidentiality Notice:  The information contained in this message,
>> including any attachments hereto, may be confidential and is intended to be
>> read only by the individual or entity to whom this message is addressed. If
>> the reader of this message is not the intended recipient or an agent or
>> designee of the intended recipient, please note that any review, use,
>> disclosure or distribution of this message or its attachments, in any form,
>> is strictly prohibited.  If you have received this message in error, please
>> immediately notify the sender and/or Notifications@carrieriq.com and
>> delete or destroy any copy of this message and its attachments.
>>
>
>

Re: Scanner with explicit columns list is very slow

Posted by Vladimir Rodionov <vl...@gmail.com>.

I modified tests:

Now I created table with one CF and 5 columns: CQ1,..,CQ5

1. Scan.addColumn(CF, CQ1);
    Scan.addColumn(CF, CQ3);

2. Scan.addFamily(CF);

Scan performance from block cache:

1.  400K rows per sec
2.  1.6M rows per sec

The explicit columns scan performance  is even worse in this case. It is
much faster to scan the WHOLE rows and filter columns later in a Filter,
than specify columns directly in a Scan.

Definitely needs to be explained/investigated.


On Mon, Oct 14, 2013 at 11:18 AM, Vladimir Rodionov <vrodionov@carrieriq.com
> wrote:

> Its 0.94.6 and there is chance that the issue has been fixed already
>
> Simple table: one column + one qualifier
>
> Two type of scans:
>
> 1. Scan.addFamily(CF)
>
> 2. Scan.addColumn(CF, CQ)
>
> Both run on block cache (all data in memory)
>
> Tested on StoreScanner directly.
>
> 1. 4.2M KVs per sec per one thread
> 2. 1.5M KVs per second per one thread.
>
> The difference? First scanner's ScanQueryMatcher returns INCLUDE, DONE,
> second - INCLUDE_NEXT_ROW, DONE
> The cost of Row's reseek is huge.
>
> Best regards,
> Vladimir Rodionov
> Principal Platform Engineer
> Carrier IQ, www.carrieriq.com
> e-mail: vrodionov@carrieriq.com
>
>
> Confidentiality Notice:  The information contained in this message,
> including any attachments hereto, may be confidential and is intended to be
> read only by the individual or entity to whom this message is addressed. If
> the reader of this message is not the intended recipient or an agent or
> designee of the intended recipient, please note that any review, use,
> disclosure or distribution of this message or its attachments, in any form,
> is strictly prohibited.  If you have received this message in error, please
> immediately notify the sender and/or Notifications@carrieriq.com and
> delete or destroy any copy of this message and its attachments.
>