FilterList with a ColumnPaginationFilter in Java (Scala) Client


I am having some difficulty understanding the results when I apply a ColumnPaginationFilter within a FilterList. I’m not sure whether this is an Hbase bug or a gap in my understanding of how the API works.

Specifically, I’m noticing a difference between using MUST_PASS_ONE vs MUST_PASS_ALL in my filterList even when I only have a single filter in the list. I walk through a full, but simplified (ie I took out the other filters in the list because I have narrowed down the problem; but I still do need to use a filterList), example below that illustrated the issue:

First, in the shell I create a table and insert multiple values with the same timestamp:
create 'ns:tbl',{NAME => 'family',VERSIONS => 100}
put 'ns:tbl','row','family:name','John',1000000000000
put 'ns:tbl','row','family:name','Jane',1000000000000
put 'ns:tbl','row','family:name','Gil',1000000000000
put 'ns:tbl','row','family:name','Jane',1000000000000

Now, I create a custom client written in Scala that uses the Java APIs:

import org.apache.hadoop.hbase.filter._
import org.apache.hadoop.hbase.util.Bytes
import org.apache.hadoop.hbase.client._
import org.apache.hadoop.hbase.{CellUtil, HBaseConfiguration, TableName}
import scala.collection.mutable._

val config = HBaseConfiguration.create()
config.set("hbase.zookeeper.quorum", "localhost")
config.set("", "2181")

val connection = ConnectionFactory.createConnection(config)

val logicalOp = FilterList.Operator.MUST_PASS_ALL
val limit = 1
var resultsList = ListBuffer[String]()
for (offset <- 0 to 20 by limit) {
            val table = connection.getTable(TableName.valueOf("ns:tbl"))
            val paginationFilter = new ColumnPaginationFilter(limit,offset)
            val filterList: FilterList = new FilterList(logicalOp,paginationFilter)
            val results = table.get(new Get(Bytes.toBytes("row")).setFilter(filterList))
            val cells = results.rawCells()
            if (cells != null) {
                        for (cell <- cells) {
                          val value = new String(CellUtil.cloneValue(cell))
                          val qualifier = new String(CellUtil.cloneQualifier(cell))
                          val family = new String(CellUtil.cloneFamily(cell))
                          val result = "OFFSET = "+offset+":"+family + "," + qualifier + "," + value + "," + cell.getTimestamp()

My results look like:
limit = 1 & logicalOp = MUST_PASS_ALL:
scala> resultsList.foreach(println)
OFFSET = 0:family,name,Jane,1000000000000

limit = 1 & logicalOp = MUST_PASS_ONE:
scala> resultsList.foreach(println)
OFFSET = 0:family,name,Jane,1000000000000
OFFSET = 1:family,name,Gil,1000000000000
OFFSET = 2:family,name,Jane,1000000000000
OFFSET = 3:family,name,John,1000000000000

limit = 2 & logicalOp = MUST_PASS_ALL:
scala> resultsList.foreach(println)
OFFSET = 0:family,name,Jane,1000000000000

limit = 2 & logicalOp = MUST_PASS_ONE:
scala> resultsList.foreach(println)
OFFSET = 0:family,name,Jane,1000000000000
OFFSET = 2:family,name,Jane,1000000000000

My main question is around why, when using MUST_PASS_ONE, don’t I get back only the single, most-recently-inserted value of the cell as I do when I use MUST_PASS_ALL? Note that if I don’t use a filterList at all and instance just set the get’s filter to the paginationFilter, I get the result I would expect (ie the single OFFSET = 0:family,name,Jane,1000000000000).

The documentation isn’t entirely clear about this situation, and I’m hoping someone on either mailing list may be able to assist.


