You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by "Anoop Sam John (JIRA)" <ji...@apache.org> on 2014/02/20 04:35:19 UTC
[jira] [Comment Edited] (PHOENIX-29) Add custom filter to more efficiently navigate KeyValues in row

    [ https://issues.apache.org/jira/browse/PHOENIX-29?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13906564#comment-13906564 ] 

Anoop Sam John edited comment on PHOENIX-29 at 2/20/14 3:33 AM:
----------------------------------------------------------------

bq.Why not add this filter in the beginning?
No Lars we can not.  Suppose the below query.
Select name, address from people where age=25;
Now the new Filter will contain only these 2 columns (name , address) and all other KVs will be filtered out.  For the condition we will have SCVF which then comes as 2nd Filter. As the 1st filter filters out age KVs, the SCVF will not get condition column KV.

bq.keep your filter at the end like you had it before and make the ExplainTable more forgiving of the FilterList order. It's better to have the PageFilter before yours so that it reduces the number of rows over which you're mucking with the KeyValues.
I think yes I can keep it at the end. Whatever I was thinking of making PageFilter at the end might not be an issue I guess.  What I thought is any filter which depends on number of rows can be better at the end. But for this particular combination of ColumnProjectionFilter and then PageFilter looks no problem..   Lars can correct if I am wrong.
PageFilter uses filterAllRemaining  to denote no more scan is needed. So even if it is not at the end no much of a diff I feel.

Still I am +1 for James suggestion for keeping it at the end as in patch V1.  I will do that change and once UTs pass will post the new version

Dealing with combination of Filters in FilterList is tricky. I wonder how easy/difficult it is for the users.  With out having some knowledge on the internal code flow, things can go wrong some times . :(  



was (Author: anoop.hbase):
bq.Why not add this filter in the beginning?
No Lars we can not.  Suppose the below query.
Select name, address from people where age=25;
Now the new Filter will contain only these 2 columns (name , address) and all other KVs will be filtered out.  For the condition we will have SCVF which then comes as 2nd Filter. As the 1st filter filters out age KVs, the SCVF will not get condition column KV.

bq.keep your filter at the end like you had it before and make the ExplainTable more forgiving of the FilterList order. It's better to have the PageFilter before yours so that it reduces the number of rows over which you're mucking with the KeyValues.
I think yes I can keep it at the end. Whatever I was thinking of making PageFilter at the end might not be an issue I guess.  What I thought is any filter which depends on number of rows can be better at the end. But for this particular combination of ColumnProjectionFilter and then PageFilter looks no problem..   Lars can correct if I am wrong.
PageFilter uses filterAllRemaining  to denote no more scan is needed. So even if it is not at the end no much of a diff I feel.

Still I am +1 for James suggestion for keeping it at the end as in patch V1.  I will do that change and once UTs pass will post the new version

Dealing with combination of Filters in FilterList is tricky. I wonder how easy/difficult it is for the users.  With out having some knowledge on the internal code flow, things can go wrong some thimes . :(  


> Add custom filter to more efficiently navigate KeyValues in row
> ---------------------------------------------------------------
>
>                 Key: PHOENIX-29
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-29
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: James Taylor
>         Attachments: PHOENIX-29.patch, PHOENIX-29_V2.patch
>
>
> Currently HBase is 50% faster at selecting the first KV in a row than in selecting any other column. The reason is that when you project a column into a Scan, HBase uses its ExplicitColumTracker which does a reseek to the column. The only case where this is not necessary is when the column is the first one.
> In most cases (unless you have thousands of versions), it'd be more efficient to just do a NEXT instead of a reseek (especially if your KV is the next one). We can provide our own custom filter through which we pass two lists:
> 1) all KVs referenced in the select expressions. These are the only ones that need to be returned back to the client which is another advantage we'd get writing this custom filter.
> 2) all KVs referenced in the WHERE clause.
> The filter could sort the KVs using the standard KeyValue.COMPARATOR and merge between them and the incoming KVs, using NEXT instead of a reseek. We could potentially use a reseek if the number of columns in the table is beyond a certain threshold.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)