You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Ryan Rawson <ry...@gmail.com> on 2009/12/02 08:26:08 UTC

Re: reading a row with lots of cells (wide-table) causing out-of-memory error

Your answers inline:


On Tue, Dec 1, 2009 at 11:06 PM, Sujee Maniyam <su...@sujee.net> wrote:
> Hi all,
>
> I have the following table
>     user_id   => {  "ip_address",   "ref_url" }
> column qualifiers are timestamps.  Created with default options
> (BLOCKSIZE => '65536', ...etc)
>
> so a typical row looks like:
>     'user1'  => {
>                           ip_address:t1  => value1
>                           ip_address:t2  => value2
>                           ref_url:t2 =>  value3
>                     }
> I have a few million rows in the table.  Trying to write a simple java client.
>
> When I query for a user_id that has around 2-million 'values' (unique
> timestamps) it is causing a region server to die with Out-of-memory
> error.
>
> code-snippet for client:
>
> // ----------
> // ---- http://pastebin.com/m75fc75d1
>
> Get get = new Get(key);
> Result r = table.get(get);
>
> String[] families = {"ip_address", "ref_url"};
> for (String family : families) {
>   NavigableMap<byte[], byte[]> familyMap =
> r.getFamilyMap(Bytes.toBytes(family));
>   System.out.println(String.format("    %s #cells : %d", family,
> familyMap.size()));
> }
> // ----------
>
>
> I am curious to know...
> 1) is the above code doing some thing wrong?

no, it looks ok.

> 2) does a row data has to completely fit  into memory?

yes. into both the server and client memory.

> 3) I will want to iterate through all the cell values, wondering what
> is the best way to do that?

0.21 will have an API that allows partial row scans.  In the mean
time, you could try several things:
- use more rows instead of columns
- use more families, query on families
- filters can choose what to pick based on column name.

> 4) if this is the limitation for 'wide tables', then I will redesign
> to table to use composite keys ( row = userid + timestamp)

It's a limitation of the API which forces us to materialize the entire
row in memory at one time.

>
> thanks so much for your help.
> Sujee Maniyam
>
> --
> http://sujee.net
>