You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by "Maryann Xue (JIRA)" <ji...@apache.org> on 2014/02/15 05:49:19 UTC

[jira] [Commented] (PHOENIX-29) Add custom filter to more efficiently navigate KeyValues in row

    [ https://issues.apache.org/jira/browse/PHOENIX-29?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13902327#comment-13902327 ] 

Maryann Xue commented on PHOENIX-29:
------------------------------------

By how much it would improve "selecting" a specific column (KeyValue)? and how much for "where" a column (KV)?
And how is that improvement related to the position of the KV, say, 2nd, 3rd, 4th place?

> Add custom filter to more efficiently navigate KeyValues in row
> ---------------------------------------------------------------
>
>                 Key: PHOENIX-29
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-29
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: James Taylor
>         Attachments: PHOENIX-29.patch
>
>
> Currently HBase is 50% faster at selecting the first KV in a row than in selecting any other column. The reason is that when you project a column into a Scan, HBase uses its ExplicitColumTracker which does a reseek to the column. The only case where this is not necessary is when the column is the first one.
> In most cases (unless you have thousands of versions), it'd be more efficient to just do a NEXT instead of a reseek (especially if your KV is the next one). We can provide our own custom filter through which we pass two lists:
> 1) all KVs referenced in the select expressions. These are the only ones that need to be returned back to the client which is another advantage we'd get writing this custom filter.
> 2) all KVs referenced in the WHERE clause.
> The filter could sort the KVs using the standard KeyValue.COMPARATOR and merge between them and the incoming KVs, using NEXT instead of a reseek. We could potentially use a reseek if the number of columns in the table is beyond a certain threshold.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)