You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Zheng Hu (JIRA)" <ji...@apache.org> on 2017/06/05 09:51:04 UTC

[jira] [Comment Edited] (HBASE-17678) FilterList with MUST_PASS_ONE lead to redundancy cells returned

    [ https://issues.apache.org/jira/browse/HBASE-17678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16036735#comment-16036735 ] 

Zheng Hu edited comment on HBASE-17678 at 6/5/17 9:50 AM:
----------------------------------------------------------

[~zghaobac],   I created a mock filter to  test whether the cell passed to filter in filter list is the expected cell ( patch v4). and found  some problems in FilterList.java: 
1.  FilterList did not consider INCLUDE_AND_SEEK_NEXT_ROW case( seems like INCLUDE_AND_SEEK_NEXT_ROW is a newly added state, and the dev forgot to consider FilterList), So if dev use INCLUDE_AND_SEEK_NEXT_ROW in his own Filter and wrapped by a FilterList,  it'll  throw  IllegalStateException("Received code is not valid."). 
2.  For FilterList with MUST_PASS_ONE,   if filter-A in filter list return  INCLUDE and filter-B in filter list return INCLUDE_AND_NEXT_COL,   the FilterList will return  INCLUDE_AND_NEXT_COL finally.  According to the mininal step rule , It's incorrect.  (filter list with MUST_PASS_ONE choose the mininal step among filters in filter list. Let's call it: The Mininal Step Rule).

I opened another issue HBASE-18160 for above problems, and let's fix this issue first.



was (Author: openinx):
[~zghaobac],   I created a mock filter to  test whether cell pass to filter in filter list is the expected cell ( patch v4). and found  some problems in FilterList.java: 
1.  FilterList did not consider INCLUDE_AND_SEEK_NEXT_ROW case( seems like INCLUDE_AND_SEEK_NEXT_ROW is a newly added state, and the dev forgot to consider FilterList), So if dev use INCLUDE_AND_SEEK_NEXT_ROW in his own Filter and wrapped by a FilterList,  it'll  throw  IllegalStateException("Received code is not valid."). 
2.  For FilterList with MUST_PASS_ONE,   if filter-A in filter list return  INCLUDE and filter-B in filter list return INCLUDE_AND_NEXT_COL,   the FilterList will return  INCLUDE_AND_NEXT_COL finally.  According to the mininal step rule , It's incorrect.  (filter list with MUST_PASS_ONE choose the mininal step among filters in filter list. Let's call it: The Mininal Step Rule).

I opened another issue HBASE-18160 for above problems, and let's fix this issue first.


> FilterList with MUST_PASS_ONE lead to redundancy cells returned
> ---------------------------------------------------------------
>
>                 Key: HBASE-17678
>                 URL: https://issues.apache.org/jira/browse/HBASE-17678
>             Project: HBase
>          Issue Type: Bug
>          Components: Filters
>    Affects Versions: 2.0.0, 1.3.0, 1.2.1
>         Environment: RedHat 7.x
>            Reporter: Jason Tokayer
>            Assignee: Zheng Hu
>         Attachments: HBASE-17678.v1.patch, HBASE-17678.v1.rough.patch, HBASE-17678.v2.patch, HBASE-17678.v3.patch, HBASE-17678.v4.patch, TestColumnPaginationFilterDemo.java
>
>
> When combining ColumnPaginationFilter with a single-element filterList, MUST_PASS_ONE and MUST_PASS_ALL give different results when there are multiple cells with the same timestamp. This is unexpected since there is only a single filter in the list, and I would believe that MUST_PASS_ALL and MUST_PASS_ONE should only affect the behavior of the joined filter and not the behavior of any one of the individual filters. If this is not a bug then it would be nice if the documentation is updated to explain this nuanced behavior.
> I know that there was a decision made in an earlier Hbase version to keep multiple cells with the same timestamp. This is generally fine but presents an issue when using the aforementioned filter combination.
> Steps to reproduce:
> In the shell create a table and insert some data:
> {code:none}
> create 'ns:tbl',{NAME => 'family',VERSIONS => 100}
> put 'ns:tbl','row','family:name','John',1000000000000
> put 'ns:tbl','row','family:name','Jane',1000000000000
> put 'ns:tbl','row','family:name','Gil',1000000000000
> put 'ns:tbl','row','family:name','Jane',1000000000000
> {code}
> Then, use a Scala client as:
> {code:none}
> import org.apache.hadoop.hbase.filter._
> import org.apache.hadoop.hbase.util.Bytes
> import org.apache.hadoop.hbase.client._
> import org.apache.hadoop.hbase.{CellUtil, HBaseConfiguration, TableName}
> import scala.collection.mutable._
> val config = HBaseConfiguration.create()
> config.set("hbase.zookeeper.quorum", "localhost")
> config.set("hbase.zookeeper.property.clientPort", "2181")
> val connection = ConnectionFactory.createConnection(config)
> val logicalOp = FilterList.Operator.MUST_PASS_ONE
> val limit = 1
> var resultsList = ListBuffer[String]()
> for (offset <- 0 to 20 by limit) {
> 	val table = connection.getTable(TableName.valueOf("ns:tbl"))
> 	val paginationFilter = new ColumnPaginationFilter(limit,offset)
> 	val filterList: FilterList = new FilterList(logicalOp,paginationFilter)
> 	println("@ filterList = "+filterList)
> 	val results = table.get(new Get(Bytes.toBytes("row")).setFilter(filterList))
> 	val cells = results.rawCells()
> 	if (cells != null) {
> 		for (cell <- cells) {
> 		  val value = new String(CellUtil.cloneValue(cell))
> 		  val qualifier = new String(CellUtil.cloneQualifier(cell))
> 		  val family = new String(CellUtil.cloneFamily(cell))
> 		  val result = "OFFSET = "+offset+":"+family + "," + qualifier + "," + value + "," + cell.getTimestamp()
> 		  resultsList.append(result)
> 		}
> 	}
> }
> resultsList.foreach(println)
> {code}
> Here are the results for different limit and logicalOp settings:
> {code:none}
> Limit = 1 & logicalOp = MUST_PASS_ALL:
> scala> resultsList.foreach(println)
> OFFSET = 0:family,name,Jane,1000000000000
> Limit = 1 & logicalOp = MUST_PASS_ONE:
> scala> resultsList.foreach(println)
> OFFSET = 0:family,name,Jane,1000000000000
> OFFSET = 1:family,name,Gil,1000000000000
> OFFSET = 2:family,name,Jane,1000000000000
> OFFSET = 3:family,name,John,1000000000000
> Limit = 2 & logicalOp = MUST_PASS_ALL:
> scala> resultsList.foreach(println)
> OFFSET = 0:family,name,Jane,1000000000000
> Limit = 2 & logicalOp = MUST_PASS_ONE:
> scala> resultsList.foreach(println)
> OFFSET = 0:family,name,Jane,1000000000000
> OFFSET = 2:family,name,Jane,1000000000000
> {code}
> So, it seems that MUST_PASS_ALL gives the expected behavior, but MUST_PASS_ONE does not. Furthermore, MUST_PASS_ONE seems to give only a single (not-duplicated)  within a page, but not across pages.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)