You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2011/09/10 01:06:08 UTC

[jira] [Commented] (HBASE-4364) Column family pruning incorrectly prunes CFs referred to by filters

    [ https://issues.apache.org/jira/browse/HBASE-4364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13101629#comment-13101629 ] 

Todd Lipcon commented on HBASE-4364:
------------------------------------

Example shell code to reproduce this:
{noformat}
create 't1', 'f1', f2'
put 't1', 'r1', 'f1:word', 'hello'
put 't1', 'r1', 'f2:word', 'bonjour'
put 't1', 'r2', 'f1:word', 'goodbye'
put 't1', 'r2', 'f2:word', 'au revoir'

# scan whole table, has 2 rows, each with 2 cols
scan 't1'
# scan selecting only one column - returns 2 distinct rows
scan 't1', { COLUMNS => ['f1:word'] }
# scan with a predicate of the french word > 'b', returns 1 row
scan 't1', { FILTER => "SingleColumnValueFilter('f2', 'word', >, 'binary:b')"  }
# scan with a predicate of the french word > 'b', selecting only the english word
scan 't1', { COLUMNS => ['f1:word'], FILTER => "SingleColumnValueFilter('f2', 'word', >, 'binary:b')"  }
{noformat}

The incorrect result is as follows:
{noformat}
hbase(main):008:0> scan 't1'
ROW                                COLUMN+CELL                                                                                       
 r1                                column=f1:word, timestamp=1315608975212, value=hello                                              
 r1                                column=f2:word, timestamp=1315608975238, value=bonjour                                            
 r2                                column=f1:word, timestamp=1315608975258, value=goodbye                                            
 r2                                column=f2:word, timestamp=1315608975286, value=au revoir                                          
2 row(s) in 0.0270 seconds

hbase(main):009:0> scan 't1', { COLUMNS => ['f1:word'] }
ROW                                COLUMN+CELL                                                                                       
 r1                                column=f1:word, timestamp=1315608975212, value=hello                                              
 r2                                column=f1:word, timestamp=1315608975258, value=goodbye                                            
2 row(s) in 0.0140 seconds

hbase(main):010:0> scan 't1', { FILTER => "SingleColumnValueFilter('f2', 'word', >, 'binary:b')"  }
ROW                                COLUMN+CELL                                                                                       
 r1                                column=f1:word, timestamp=1315608975212, value=hello                                              
 r1                                column=f2:word, timestamp=1315608975238, value=bonjour                                            
1 row(s) in 0.0250 seconds

hbase(main):011:0> scan 't1', { COLUMNS => ['f1:word'], FILTER => "SingleColumnValueFilter('f2', 'word', >, 'binary:b')"  }
ROW                                COLUMN+CELL                                                                                       
 r1                                column=f1:word, timestamp=1315608975212, value=hello                                              
 r2                                column=f1:word, timestamp=1315608975258, value=goodbye                                            
2 row(s) in 0.0270 seconds <---- SHOULD NOT HAVE RETURNED ANY VALUE FOR r2!
{noformat}


> Column family pruning incorrectly prunes CFs referred to by filters
> -------------------------------------------------------------------
>
>                 Key: HBASE-4364
>                 URL: https://issues.apache.org/jira/browse/HBASE-4364
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4, 0.92.0
>            Reporter: Todd Lipcon
>            Priority: Critical
>
> For a scan, if you select some set of columns using addColumns(), and then apply a SingleColumnValueFilter that restricts the results based on some other columns which aren't selected, and those non-selected columns are part of a separate column family, then those filter conditions are ignored.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira