You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@phoenix.apache.org by "Lars Hofhansl (Jira)" <ji...@apache.org> on 2021/03/16 06:00:03 UTC

[jira] [Comment Edited] (PHOENIX-6412) Consider batching uncovered column merge for local indexes

    [ https://issues.apache.org/jira/browse/PHOENIX-6412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17302238#comment-17302238 ] 

Lars Hofhansl edited comment on PHOENIX-6412 at 3/16/21, 5:59 AM:
------------------------------------------------------------------

Performancewise when using FAST_DIFF on the main data CF, I see hardly any improvement, though.

Looks like RESEEK with FAST_DIFF is hardly any faster than a full SEEK each time. Since the data region is local there is no RPC overhead. All the time is simply spent in the FAST_DIFF decoder.

I did see an improvement when I switch the block encoding to ROW_INDEX_V1. Overall, though, this does not seem to be worth the effort.

[~kozdemir], FYI. Not what I had expected. But I guess it makes sense.


was (Author: lhofhansl):
Performancewise when using FAST_DIFF on the main data CF, I see hardly any improvement, though.

Looks like RESEEK with FAST_DIFF is hardly any faster than a full SEEK each time. Since the data region is local there is not RPC overhead.

I did see an improvement when I switch the block encoding to ROW_INDEX_V1. Overall, though, this does not seem to be worth the effort.

[~kozdemir], FYI. Not what I had expected. But I guess it makes sense.

> Consider batching uncovered column merge for local indexes
> ----------------------------------------------------------
>
>                 Key: PHOENIX-6412
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-6412
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Lars Hofhansl
>            Priority: Minor
>             Fix For: 5.2.0
>
>         Attachments: 6412-hack.txt
>
>
> Currently uncovered columns are merged row-by-row, performing a Get to the data region for each matching row in the index region.
> Each Get needs to seek all the store scanners, and doing this per row is quite expensive.
> Instead we could batch inside the RegionScannerFactory.getWrappedScanner() -> RegionScanner.nextRaw() method. Collect N index rows and then execute a single skip scan on the data region. 
> I might be able to get to that, but there's someone who is interested in taking this up I would not mind :)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)