You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by "James Taylor (JIRA)" <ji...@apache.org> on 2014/02/08 01:34:21 UTC

[jira] [Created] (PHOENIX-29) Add custom filter to more efficiently navigate KeyValues in row

James Taylor created PHOENIX-29:
-----------------------------------

             Summary: Add custom filter to more efficiently navigate KeyValues in row
                 Key: PHOENIX-29
                 URL: https://issues.apache.org/jira/browse/PHOENIX-29
             Project: Phoenix
          Issue Type: Bug
            Reporter: James Taylor


Currently HBase is 50% faster at selecting the first KV in a row than in selecting any other column. The reason is that when you project a column into a Scan, HBase uses its ExplicitColumTracker which does a reseek to the column. The only case where this is not necessary is when the column is the first one.

In most cases (unless you have thousands of versions), it'd be more efficient to just do a NEXT instead of a reseek (especially if your KV is the next one). We can provide our own custom filter through which we pass two lists:
1) all KVs referenced in the select expressions. These are the only ones that need to be returned back to the client which is another advantage we'd get writing this custom filter.
2) all KVs referenced in the WHERE clause.
The filter could sort the KVs using the standard KeyValue.COMPARATOR and merge between them and the incoming KVs, using NEXT instead of a reseek. We could potentially use a reseek if the number of columns in the table is beyond a certain threshold.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)