You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Bryan Beaudreault (Jira)" <ji...@apache.org> on 2022/12/27 21:30:00 UTC

[jira] [Updated] (HBASE-27227) Long running heavily filtered scans hold up too many ByteBuffAllocator buffers

     [ https://issues.apache.org/jira/browse/HBASE-27227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bryan Beaudreault updated HBASE-27227:
--------------------------------------
    Attachment: Screen Shot 2022-12-27 at 4.25.31 PM.png
      Assignee: Bryan Beaudreault
        Labels: patch-available  (was: )
        Status: Patch Available  (was: Open)

I've attached a draft PR: [https://github.com/apache/hbase/pull/4940]

I decided to skip quotas for now and instead fix the problem at hand. I'll handle quotas in a separate issue, since I think that's still valuable.

The PR has a solution for eagerly releasing blocks which have been totally skipped by filters. So if a block contains cells for 10 rows and all 10 rows are filtered, the block will be immediately released.

I've attached a screenshot comparison of two identical clusters with identical data, both running the same "needle in a haystack" scan where most rows are filtered. One cluster has this patch, the other does not. The cluster with the patch ends up allocating all blocks from the pool, while the unpatched cluster ends up leaking about 50% to the heap.

> Long running heavily filtered scans hold up too many ByteBuffAllocator buffers
> ------------------------------------------------------------------------------
>
>                 Key: HBASE-27227
>                 URL: https://issues.apache.org/jira/browse/HBASE-27227
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Bryan Beaudreault
>            Assignee: Bryan Beaudreault
>            Priority: Major
>              Labels: patch-available
>         Attachments: Screen Shot 2022-07-20 at 10.52.40 AM.png, Screen Shot 2022-12-27 at 4.25.31 PM.png
>
>
> We have a workload which is launching long running scans searching for a needle in a haystack. They have a timeout of 60s, so are allowed to run on the server for 30s. Most of the rows are filtered, and the final result is usually only a few kb.
> When these scans are running, we notice our ByteBuffAllocator pool usage goes to 100% and we start seeing 100+ MB/s of heap allocations. When the scans finish, the pool goes back to normal and heap allocations go away.
> My working theory here is that we are only releasing ByteBuff's once we call {{shipper.shipped(),}} which only happens once a response is returned to the user. This works fine for normal scans which are likely to quickly find enough results to return, but for long running scans in which most of the results are filtered we end up holding on to more and more buffers until the scan finally returns.
> We should consider whether it's possible to release buffers for blocks whose cells have been completely skipped by a scan.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)