You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "shriram (Jira)" <ji...@apache.org> on 2021/08/09 06:54:00 UTC

[jira] [Created] (HBASE-26183) Size of the Result object while querying huge data from HBASE table

shriram created HBASE-26183:
-------------------------------

             Summary: Size of the Result object while querying huge data from HBASE table
                 Key: HBASE-26183
                 URL: https://issues.apache.org/jira/browse/HBASE-26183
             Project: HBase
          Issue Type: New Feature
          Components: scan
    Affects Versions: 1.1.13
            Reporter: shriram


 
I am trying to query hbase table with rowkeys. We have the following structure
 * index table which has rowkeys of the actual table
 * actual table which contains json data in compressed format.

When i am trying to query hbase, i have to scan first index table for rowkeys using scan with some filters which will results to byte array.(row keys). Once we obtained rowkeys, we are invoking listofGets() in Table object. Once obtained we are iterating the object and prepare a list which contains compressed json objects. Here we are not sure about the size and number of the objects. In case of number of objects is huge we may result in OOM. Do we have any options to return Iterator or buffering the results so that we can avoid OOM.
 {{for (byte[] rowkey : indexTableOutput)
{    Get get = new Get(rowkey).addFamily(Bytes.toBytes(columnFamilty)).setMaxVersions(MAX_VERSIONS);
    listOfget.add(get);
}}}
The above piece of code which is used to retrieve the keys from index table.
 {{TableName tableName = TableName.valueOf("table1");Table tableObj = conn.getTable(tableName);
Result[] results = tableObj.get(listOfget);}}
From the above piece of code we have few queries. Any help would be appreciated.
 * If we have a huge number of data, Result[] will contain all the results?
 * How to return a iterator kind of object so that we can leave it to consumer because keeping all the data and doing processing will result in OOM
 * Any other options to return a limited data so that consumer do processing and continue

I could find a resultscanner is returning for scan objects. But couldn't find any other options for list of Get's. Here we know the exact keys from index table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)