You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Bryan Beaudreault (Jira)" <ji...@apache.org> on 2021/12/09 21:59:00 UTC

[jira] [Resolved] (HBASE-26122) Limit max result size of individual Gets

     [ https://issues.apache.org/jira/browse/HBASE-26122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bryan Beaudreault resolved HBASE-26122.
---------------------------------------
    Fix Version/s:     (was: 3.0.0-alpha-2)
                       (was: 2.6.0)
     Release Note:   (was: Can now call Get.setMaxResultSize(). When set to a positive value, the server will return the results when that threshold is met. This may result in partial results for large rows, so the caller is expected to handle the case where Result#mayHaveMoreCellsInRow() is true when setMaxResultSize is used. Possible options include paginating using PageFilter, reducing the returned data set using other filters, converting the Get to a Scan (which can take advantage of partial response stitching), or throwing a non-retryable exception if using this as a guardrail. See below for example usage in shell:

Create table

hbase:005:0> create 't1', 'f1'
Created table t1
Took 1.1306 seconds

Insert test data

hbase:012:0> put 't1', 'r1', 'f1:c1', 'a'
Took 0.0416 seconds
hbase:014:0> put 't1', 'r1', 'f1:c2', 'b'
Took 0.0059 seconds
hbase:015:0> put 't1', 'r1', 'f1:c3', 'c'
Took 0.0097 seconds

Get without setMaxResultSize, returns full row and mayHaveMoreCellsInRow = false

hbase:037:0> g = Get.new('r1'.to_s.to_java_bytes)
=> #<Java::OrgApacheHadoopHbaseClient::Get:0x11fa11b2>
hbase:038:0> result = @hbase.table('t1', @shell).instance_variable_get(:@table).get(g)
=> #<Java::OrgApacheHadoopHbaseClient::Result:0x217009bd>
hbase:039:0> result.mayHaveMoreCellsInRow
=> false
hbase:040:0> result.toString
=> "keyvalues={r1/f1:c1/1627498270850/Put/vlen=1/seqid=0, r1/f1:c2/1627498276326/Put/vlen=1/seqid=0, r1/f1:c3/1627498280413/Put/vlen=1/seqid=0}"

Get with setMaxResultSize, returns first two columns and mayHaveMoreCellsInRow = true

hbase:059:0> g = Get.new('r1'.to_s.to_java_bytes).setMaxResultSize(100)
=> #<Java::OrgApacheHadoopHbaseClient::Get:0x5ed88e31>
hbase:060:0> result = @hbase.table('t1', @shell).instance_variable_get(:@table).get(g)
=> #<Java::OrgApacheHadoopHbaseClient::Result:0x574e4184>
hbase:061:0> result.mayHaveMoreCellsInRow
=> true
hbase:062:0> result.toString
=> "keyvalues={r1/f1:c1/1627498270850/Put/vlen=1/seqid=0, r1/f1:c2/1627498276326/Put/vlen=1/seqid=0}")
       Resolution: Won't Fix

> Limit max result size of individual Gets
> ----------------------------------------
>
>                 Key: HBASE-26122
>                 URL: https://issues.apache.org/jira/browse/HBASE-26122
>             Project: HBase
>          Issue Type: New Feature
>          Components: Client, regionserver
>            Reporter: Bryan Beaudreault
>            Assignee: Bryan Beaudreault
>            Priority: Major
>
> Scans have the ability to have a configured max result size, which causes them to return a partial result once the limit has been reached. MultiGets also can throw MultiActionResultTooLarge if the response size is over a configured quota. Neither of these really accounts for a single Get of a too-large row. Such too-large Gets can cause substantial GC pressure or worse if sent at volume.
> Currently one can work around this by converting their Get to a single row Scan, but this requires a developer to proactively know about and prepare for the issue by using a Scan upfront or wait for the RegionServer to choke on a large request and only then rewrite the Get for future requests.
> We should implement the same response size limits for for Get as for Scan, whereby the server returns a partial result to the client for handling.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)