You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by "Oliver Meyn (GBIF)" <om...@gbif.org> on 2012/10/25 10:24:51 UTC

resource usage of ResultScanner's Iterator

Hi all,

I'm on cdh3u3 (hbase 0.90.4) and I need to provide a bunch of row keys based on a column value (e.g. give me all keys where column "dataset" = 1234).  That's straightforward using a scan and filter.  The trick is that I want to return an Iterator over my key type (Integer) rather than expose HBase internals (i.e. Result), so I need some kind of Decorator that wraps the Iterator<Result>.  For every call to next() I'd then call the underlying iterator's next() and extract my Integer key from the Result.  That all works fine, but what I'm wondering is what resources the Iterator<Result> is holding, and how I can release those from my decorator.

In my current implementation the decorator's constructor looks like:

public OccurrenceKeyIterator(HTablePool tablePool, String occurrenceTableName, Scan scan)

and the constructor builds the ResultScanner and subsequent iterator.  In my hasNext() method I can check the underlying iterator and if it says false I can shutdown my scanner and return the table to the TablePool. But what if the end-user never reaches the end of the Iterator, or just dereferences it? Am I at risk of leaking tables, connections or anything else?  Any tips on what I should do?

Thanks,
Oliver

--
Oliver Meyn
Software Developer
Global Biodiversity Information Facility (GBIF)
+45 35 32 15 12
http://www.gbif.org


Re: resource usage of ResultScanner's Iterator

Posted by "Oliver Meyn (GBIF)" <om...@gbif.org>.
On 2012-10-26, at 9:59 PM, Stack wrote:

> On Thu, Oct 25, 2012 at 1:24 AM, Oliver Meyn (GBIF) <om...@gbif.org> wrote:
>> Hi all,
>> 
>> I'm on cdh3u3 (hbase 0.90.4) and I need to provide a bunch of row keys based on a column value (e.g. give me all keys where column "dataset" = 1234).  That's straightforward using a scan and filter.  The trick is that I want to return an Iterator over my key type (Integer) rather than expose HBase internals (i.e. Result), so I need some kind of Decorator that wraps the Iterator<Result>.  For every call to next() I'd then call the underlying iterator's next() and extract my Integer key from the Result.  That all works fine, but what I'm wondering is what resources the Iterator<Result> is holding, and how I can release those from my decorator.
>> 
>> In my current implementation the decorator's constructor looks like:
>> 
>> public OccurrenceKeyIterator(HTablePool tablePool, String occurrenceTableName, Scan scan)
>> 
>> and the constructor builds the ResultScanner and subsequent iterator.  In my hasNext() method I can check the underlying iterator and if it says false I can shutdown my scanner and return the table to the TablePool. But what if the end-user never reaches the end of the Iterator, or just dereferences it? Am I at risk of leaking tables, connections or anything else?  Any tips on what I should do?
>> 
> 
> If the close is not called, this is what will be missed on the HTable instance:
> 
> 
>    flushCommits();
>    if (cleanupPoolOnClose) {
>      this.pool.shutdown();
>    }
>    if (cleanupConnectionOnClose) {
>      if (this.connection != null) {
>        this.connection.close();
>      }
>    }
>    this.closed = true;
> 
> 
> In your case, the flushing of commits is of no import.
> 
> The pool above is an executor service inside of HTable used doing
> batch calls.  Again, you don't really use it but should probably get
> cleaned up.
> 
> The connection close is good because though all HTables share a
> Connection, the above close updates reference counters so we know when
> we can let go of the connection.
> 
> Keep a list of what you've given out and if unused in N minutes, close
> it yourself in background?

This kind of thing was all I could come up with but feels a bit messy.  It sounds like the only real consequence of not closing nicely is that the reference counter doesn't get decremented, meaning the Connection wouldn't get garbage collected if it were dereferenced.  Is that right?  That doesn't sound too bad to me since the pool will be holding on to that connection anyway, right? (Keeping in mind that the normal use case is everything gets cleaned when end-user finishes iterating).

> (when you fellas going to upgrade?)

It's definitely in the plan, but keeps getting pushed down in favour of getting work done :)  I read in the javadoc that the behaviour of tablepool and table close changes in newer hbases - does my use case here change too (i.e. is it even less dangerous to leave a table hanging in newer hbase)?

Thanks a lot for digging in to this Stack!

Oliver

--
Oliver Meyn
Software Developer
Global Biodiversity Information Facility (GBIF)
+45 35 32 15 12
http://www.gbif.org


Re: resource usage of ResultScanner's Iterator

Posted by Stack <st...@duboce.net>.
On Thu, Oct 25, 2012 at 1:24 AM, Oliver Meyn (GBIF) <om...@gbif.org> wrote:
> Hi all,
>
> I'm on cdh3u3 (hbase 0.90.4) and I need to provide a bunch of row keys based on a column value (e.g. give me all keys where column "dataset" = 1234).  That's straightforward using a scan and filter.  The trick is that I want to return an Iterator over my key type (Integer) rather than expose HBase internals (i.e. Result), so I need some kind of Decorator that wraps the Iterator<Result>.  For every call to next() I'd then call the underlying iterator's next() and extract my Integer key from the Result.  That all works fine, but what I'm wondering is what resources the Iterator<Result> is holding, and how I can release those from my decorator.
>
> In my current implementation the decorator's constructor looks like:
>
> public OccurrenceKeyIterator(HTablePool tablePool, String occurrenceTableName, Scan scan)
>
> and the constructor builds the ResultScanner and subsequent iterator.  In my hasNext() method I can check the underlying iterator and if it says false I can shutdown my scanner and return the table to the TablePool. But what if the end-user never reaches the end of the Iterator, or just dereferences it? Am I at risk of leaking tables, connections or anything else?  Any tips on what I should do?
>

If the close is not called, this is what will be missed on the HTable instance:


    flushCommits();
    if (cleanupPoolOnClose) {
      this.pool.shutdown();
    }
    if (cleanupConnectionOnClose) {
      if (this.connection != null) {
        this.connection.close();
      }
    }
    this.closed = true;


In your case, the flushing of commits is of no import.

The pool above is an executor service inside of HTable used doing
batch calls.  Again, you don't really use it but should probably get
cleaned up.

The connection close is good because though all HTables share a
Connection, the above close updates reference counters so we know when
we can let go of the connection.

Keep a list of what you've given out and if unused in N minutes, close
it yourself in background?

Good on you Oliver (when you fellas going to upgrade?)

St.Ack