You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by James Turton <dz...@apache.org> on 2022/04/11 05:32:59 UTC

CALCITE-4992 and adapter resource leaks

Hi Calcite devs!

There are resource leaks affecting some Calcite adapters, including ES 
and Cassandra and probably some others, whereby Calcite internally 
creates clients objects from external libraries in order to access the 
system in question and never closes those clients. The reason that there 
is no trivial fix available is that it is only the application that 
knows when said clients may be closed, and Calcite offers the 
application no way to signal this to it.

The case I have real world information about, ES, is a serious problem 
because the resource leak is unbounded.  Here Calcite creates an ES 
"RestClient" for every call to create() in the schema factory and the 
RestClient leaks at least a file descriptor if it is not closed.  
Operating systems enforce a per-process file descriptor quota.  If your 
application, Drill in our case, makes one too many calls to the ES 
schema factory's create() method, then the JVM hosting it is summarily 
executed by the OS.

In the case of Cassandra, the situation looks less bad to me in that 
client objects are reused by Calcite.  This means that if the 
application only ever wants to talk to a finite distinct number of 
Cassandra endpoints then the resource leak is bounded and, most likely, 
quite small.  In https://github.com/apache/calcite/pull/2698, I've been 
revising a patch to the ES schema factory to introduce the same sort of 
reuse to constrain, but not cure, the resource leak there.  The patch is 
unquestionably a nasty "band aid" and that prompted some discussion with 
its reviewers and a request that I email this list.

I think Calcite might have to make a design decision, perhaps one that 
means that either

  * it abstains from connection management entirely in adapters, which
    might break some of its public APIs since then applications must
    start to pass connections in (or might they be smuggled inside the
    operand Map?) or
  * it starts to use connections in single-use way, freeing them
    immediately and taking a performance hit or
  * it makes Schema, or some other, objects closeable by the application
    and propagates these events to the adapter code responsible for
    managing connections.

Thanks

James

Re: CALCITE-4992 and adapter resource leaks

Posted by Julian Hyde <jh...@gmail.com>.
I don’t think a lot of thought went into how adapters use native objects.  For any given adapter we could start a discussion about to to manage the lifecycle (e.g. a pool or cache or factory).

Sent from my iPad

> On Apr 10, 2022, at 10:33 PM, James Turton <dz...@apache.org> wrote:
> 
> Hi Calcite devs!
> 
> There are resource leaks affecting some Calcite adapters, including ES and Cassandra and probably some others, whereby Calcite internally creates clients objects from external libraries in order to access the system in question and never closes those clients. The reason that there is no trivial fix available is that it is only the application that knows when said clients may be closed, and Calcite offers the application no way to signal this to it.
> 
> The case I have real world information about, ES, is a serious problem because the resource leak is unbounded.  Here Calcite creates an ES "RestClient" for every call to create() in the schema factory and the RestClient leaks at least a file descriptor if it is not closed.  Operating systems enforce a per-process file descriptor quota.  If your application, Drill in our case, makes one too many calls to the ES schema factory's create() method, then the JVM hosting it is summarily executed by the OS.
> 
> In the case of Cassandra, the situation looks less bad to me in that client objects are reused by Calcite.  This means that if the application only ever wants to talk to a finite distinct number of Cassandra endpoints then the resource leak is bounded and, most likely, quite small.  In https://github.com/apache/calcite/pull/2698, I've been revising a patch to the ES schema factory to introduce the same sort of reuse to constrain, but not cure, the resource leak there.  The patch is unquestionably a nasty "band aid" and that prompted some discussion with its reviewers and a request that I email this list.
> 
> I think Calcite might have to make a design decision, perhaps one that means that either
> 
> * it abstains from connection management entirely in adapters, which
>   might break some of its public APIs since then applications must
>   start to pass connections in (or might they be smuggled inside the
>   operand Map?) or
> * it starts to use connections in single-use way, freeing them
>   immediately and taking a performance hit or
> * it makes Schema, or some other, objects closeable by the application
>   and propagates these events to the adapter code responsible for
>   managing connections.
> 
> Thanks
> 
> James