You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Junegunn Choi (Jira)" <ji...@apache.org> on 2019/12/05 06:45:00 UTC

[jira] [Created] (HBASE-23370) PageFilter returns extra records even when page is filled within a region

Junegunn Choi created HBASE-23370:
-------------------------------------

             Summary: PageFilter returns extra records even when page is filled within a region
                 Key: HBASE-23370
                 URL: https://issues.apache.org/jira/browse/HBASE-23370
             Project: HBase
          Issue Type: Bug
    Affects Versions: 3.0.0
            Reporter: Junegunn Choi


I'm aware that the latest version of HBase has {{Scan#setLimit}} and it should nicely replace PageFilter in most use cases. However, I'd like to point out that the filter behaves strangely in the following scenario.

Let's say we have a table with 10 regions, and each region holds 100 records.
{code:ruby}
create 'page-filter', 'd', SPLITS => (1..9).map(&:to_s)
1000.times.each { |i| put 'page-filter', format('%04d', i).reverse, 'd:foo', 'bar' }
{code}
And if I scan the table with {{PageFilter(30)}}, I'd expect to see only 30 records. While {{PageFilter}} does not guarantee that the number of the returned records is smaller than the specified size, we have more than 30 records in the first region, so the page will be filled and the filter should immediately terminate the scan.
{code:ruby}
scan 'page-filter', FILTER => 'PageFilter(30)'
{code}
However, this returns 300 records, 30 records from the beginning of each region. The client keeps advancing to the next region when it shouldn't, and it's because of {{results.isEmpty()}} condition in the following code:

[https://github.com/apache/hbase/blob/12c19a6e5105d898e93e385e0cded5eabceb8a40/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java#L3552-L3558]

I can confirm that removing the condition fixes the issue. Is the comment "_This is used to keep compatible with the old scan implementation_" still valid?

I'll upload a patch to see how it affects the existing test cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)