You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by James Turton <dz...@apache.org> on 2022/04/11 05:32:59 UTC
CALCITE-4992 and adapter resource leaks
Hi Calcite devs!
There are resource leaks affecting some Calcite adapters, including ES
and Cassandra and probably some others, whereby Calcite internally
creates clients objects from external libraries in order to access the
system in question and never closes those clients. The reason that there
is no trivial fix available is that it is only the application that
knows when said clients may be closed, and Calcite offers the
application no way to signal this to it.
The case I have real world information about, ES, is a serious problem
because the resource leak is unbounded. Here Calcite creates an ES
"RestClient" for every call to create() in the schema factory and the
RestClient leaks at least a file descriptor if it is not closed.
Operating systems enforce a per-process file descriptor quota. If your
application, Drill in our case, makes one too many calls to the ES
schema factory's create() method, then the JVM hosting it is summarily
executed by the OS.
In the case of Cassandra, the situation looks less bad to me in that
client objects are reused by Calcite. This means that if the
application only ever wants to talk to a finite distinct number of
Cassandra endpoints then the resource leak is bounded and, most likely,
quite small. In https://github.com/apache/calcite/pull/2698, I've been
revising a patch to the ES schema factory to introduce the same sort of
reuse to constrain, but not cure, the resource leak there. The patch is
unquestionably a nasty "band aid" and that prompted some discussion with
its reviewers and a request that I email this list.
I think Calcite might have to make a design decision, perhaps one that
means that either
* it abstains from connection management entirely in adapters, which
might break some of its public APIs since then applications must
start to pass connections in (or might they be smuggled inside the
operand Map?) or
* it starts to use connections in single-use way, freeing them
immediately and taking a performance hit or
* it makes Schema, or some other, objects closeable by the application
and propagates these events to the adapter code responsible for
managing connections.
Thanks
James
Re: CALCITE-4992 and adapter resource leaks
Posted by Julian Hyde <jh...@gmail.com>.
I don’t think a lot of thought went into how adapters use native objects. For any given adapter we could start a discussion about to to manage the lifecycle (e.g. a pool or cache or factory).
Sent from my iPad
> On Apr 10, 2022, at 10:33 PM, James Turton <dz...@apache.org> wrote:
>
> Hi Calcite devs!
>
> There are resource leaks affecting some Calcite adapters, including ES and Cassandra and probably some others, whereby Calcite internally creates clients objects from external libraries in order to access the system in question and never closes those clients. The reason that there is no trivial fix available is that it is only the application that knows when said clients may be closed, and Calcite offers the application no way to signal this to it.
>
> The case I have real world information about, ES, is a serious problem because the resource leak is unbounded. Here Calcite creates an ES "RestClient" for every call to create() in the schema factory and the RestClient leaks at least a file descriptor if it is not closed. Operating systems enforce a per-process file descriptor quota. If your application, Drill in our case, makes one too many calls to the ES schema factory's create() method, then the JVM hosting it is summarily executed by the OS.
>
> In the case of Cassandra, the situation looks less bad to me in that client objects are reused by Calcite. This means that if the application only ever wants to talk to a finite distinct number of Cassandra endpoints then the resource leak is bounded and, most likely, quite small. In https://github.com/apache/calcite/pull/2698, I've been revising a patch to the ES schema factory to introduce the same sort of reuse to constrain, but not cure, the resource leak there. The patch is unquestionably a nasty "band aid" and that prompted some discussion with its reviewers and a request that I email this list.
>
> I think Calcite might have to make a design decision, perhaps one that means that either
>
> * it abstains from connection management entirely in adapters, which
> might break some of its public APIs since then applications must
> start to pass connections in (or might they be smuggled inside the
> operand Map?) or
> * it starts to use connections in single-use way, freeing them
> immediately and taking a performance hit or
> * it makes Schema, or some other, objects closeable by the application
> and propagates these events to the adapter code responsible for
> managing connections.
>
> Thanks
>
> James