You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Bryan Beaudreault (Jira)" <ji...@apache.org> on 2023/01/06 23:15:00 UTC
[jira] [Commented] (HBASE-27558) Scan quotas and limits should account for total block IO

    [ https://issues.apache.org/jira/browse/HBASE-27558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17655600#comment-17655600 ] 

Bryan Beaudreault commented on HBASE-27558:
-------------------------------------------

As of HBASE-18294, ScannerContext dataSize and heapSize fields are almost identical. dataSize is “cell.getSerializedSize() + Bytes.{_}SIZEOF_INT”{_} per PrivateCellUtil.estimatedSerializedSizeOf. heapSize is "cell.getSerializedSize() + FIXED_OVERHEAD", per all of the cell implementations of that method. The fixed overhead will often be on the order of 50-60 bytes depending on the extra fields in each object. It seems sort of pointless to have 2 such similar values, and from a read perspective the heapSize is actually incorrect.

On the server side, the actual memory retained for a read must include the actual length of the block(s) backing those cells. The full blocks are held in memory until the request is finished and they are released. So for ScannerContext I suggest we increment heapSize by cell.heapSize() - cell.getSerializedSize(). We’d also increment it by blockSize for each block loaded (and retained) during the request.

Additionally, will add a new "blockSize" field to ScannerContext which will be incremented for all blocks read during the request (not just retained). The difference between this and heapSize would depend on how much of requested blocks were able to be released early due to filters (see HBASE-27227)

> Scan quotas and limits should account for total block IO
> --------------------------------------------------------
>
>                 Key: HBASE-27558
>                 URL: https://issues.apache.org/jira/browse/HBASE-27558
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Bryan Beaudreault
>            Priority: Major
>
> Scan and Multi requests pull the byte throughput limit from Quotas.getReadAvailable(). Multis validate the result inline in RSRpcServices, by checking the accumulated {{RpcCallContext.getResponseCellSize}} and {{getResponseBlockSize}} against the read available after each action. Scans make use of {{{}ScannerContext{}}}, and only checks the total cell serialized size and {{{}cell.heapSize(){}}}.
> The handling for Multis was added in HBASE-14978. The block size is checked because regardless of the actual cell size, the regionserver needs to retain entire blocks backing those cells for the lifetime of a request. If the retained blocks grows too large, a regionserver can OOM or experience heavy GC pressure.
> So multis validate read available against the actual block size retained for the responses, but scans only account for cell sizes. We should extend the same block support to scans through ScannerContext tracking block bytes scanned.
> Large scans can read over ranges of both returned and filtered rows. Despite what's returned the users, the server-side cost of the scan is just as impacted by filtered rows as non-filtered.
> Both Scans and Multis take the Math.min of Quotas read available and hbase.server.scanner.max.result.size. Scans further take the min of that and Scan.setMaxResultSize.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)